Skip to content

Measure io writes for vector upsertions#5944

Merged
JojiiOfficial merged 5 commits intodevfrom
measure_vector_insert_io_writes
Mar 10, 2025
Merged

Measure io writes for vector upsertions#5944
JojiiOfficial merged 5 commits intodevfrom
measure_vector_insert_io_writes

Conversation

@JojiiOfficial
Copy link
Contributor

Depends on #5922

Adds measurement for vector storage upsertions

@generall generall force-pushed the measure_update_op_io branch from d73fe4f to 353f5f7 Compare February 13, 2025 15:46
@ffuugoo ffuugoo force-pushed the measure_update_op_io branch from 7b8d7e6 to d57cc2f Compare February 18, 2025 11:07
@JojiiOfficial JojiiOfficial force-pushed the measure_update_op_io branch 3 times, most recently from 2d0886a to c357963 Compare March 4, 2025 14:56
@JojiiOfficial JojiiOfficial force-pushed the measure_vector_insert_io_writes branch 2 times, most recently from e11a0f4 to 1bc7f4f Compare March 5, 2025 10:56
@JojiiOfficial JojiiOfficial force-pushed the measure_update_op_io branch from e7ac5f1 to 0e6aa45 Compare March 6, 2025 12:24
Base automatically changed from measure_update_op_io to dev March 6, 2025 14:03
@JojiiOfficial JojiiOfficial force-pushed the measure_vector_insert_io_writes branch from 1bc7f4f to 2b0c8bb Compare March 6, 2025 14:13
@JojiiOfficial JojiiOfficial requested a review from timvisee March 6, 2025 14:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2025

📝 Walkthrough

Walkthrough

The changes systematically update several modules by modifying existing method signatures to include an additional parameter of type &HardwareCounterCell. This new parameter is introduced in functions related to vector insertion and update—such as insert_vector and update_vector—across various vector storage implementations (dense, sparse, multi-dense, GPU, and others) as well as in benchmark, test, and integration files. In many cases, calls to these functions are modified to pass an instance of HardwareCounterCell, replacing prior placeholder or disposable instances. The updates also include renaming unused placeholders, adding new dependency imports, and incorporating a new utility function in a Python test fixture to handle point vector updates. Overall, the changes standardize the tracking of hardware counter metrics across the codebase.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

168-183: 🛠️ Refactor suggestion

Handle potential serialization errors instead of using unwrap()
Currently, serialization failures will panic. For consistency with lines 56-58 and improved robustness, it’s preferable to handle these errors gracefully. Consider applying this diff:

- let key_enc = bincode::serialize(&key).unwrap();
- let record_enc = bincode::serialize(&record).unwrap();
+ let key_enc = bincode::serialize(&key)
+     .map_err(|err| OperationError::service_error(format!("cannot serialize key: {err}")))?;
+ let record_enc = bincode::serialize(&record)
+     .map_err(|err| OperationError::service_error(format!("cannot serialize record: {err}")))?;
🧹 Nitpick comments (1)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

229-241: Add a brief docstring for the new hw_counter parameter
To improve clarity and maintainability, consider documenting the intended usage and effects of hw_counter within insert_vector.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fded09 and 2b0c8bb.

📒 Files selected for processing (34)
  • lib/segment/benches/sparse_index_build.rs (2 hunks)
  • lib/segment/benches/sparse_vector_storage.rs (3 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/fixtures/sparse_fixtures.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (1 hunks)
  • lib/segment/src/index/plain_vector_index.rs (2 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (2 hunks)
  • lib/segment/src/index/vector_index_base.rs (3 hunks)
  • lib/segment/src/segment/entry.rs (3 hunks)
  • lib/segment/src/segment/segment_ops.rs (3 hunks)
  • lib/segment/src/segment/tests.rs (3 hunks)
  • lib/segment/src/vector_storage/chunked_mmap_vectors.rs (10 hunks)
  • lib/segment/src/vector_storage/chunked_vector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (13 hunks)
  • lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
  • lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4 hunks)
  • lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8 hunks)
  • lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (3 hunks)
  • lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/utils.rs (2 hunks)
  • lib/segment/src/vector_storage/vector_storage_base.rs (3 hunks)
  • lib/segment/tests/integration/multivector_hnsw_test.rs (1 hunks)
  • lib/segment/tests/integration/sparse_vector_index_search_tests.rs (1 hunks)
  • tests/consensus_tests/fixtures.py (1 hunks)
  • tests/consensus_tests/test_hw_measurement.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • lib/segment/src/index/hnsw_index/hnsw.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (146)
tests/consensus_tests/fixtures.py (1)

50-64: Well-implemented utility function for vector updates

The new update_points_vector function is properly implemented to handle vector updates via the REST API. It follows the same pattern as other similar functions in this file (like update_points_payload), maintaining consistency in the codebase.

This function will be useful for testing the hardware measurement capabilities for vector upsertions, which is the main goal of this PR.

lib/segment/tests/integration/multivector_hnsw_test.rs (1)

109-109: Added hardware counter parameter correctly

The hardware counter parameter has been properly added to the insert_vector method call, which will allow tracking I/O operations during vector insertion.

The hw_counter variable used here was already defined at line 68, so this change is well integrated into the existing test.

lib/segment/benches/sparse_index_build.rs (2)

9-9: Correctly imported hardware counter module

The HardwareCounterCell import has been added properly to support the measurement functionality.


64-65: Hardware counter integration looks good

The hardware counter is properly initialized before the loop and correctly passed to the insert_vector method, enabling I/O operation measurement during benchmark tests.

This change is consistent with the PR's goal of adding measurement capabilities for vector storage upsertions.

Also applies to: 70-71

lib/segment/src/vector_storage/tests/utils.rs (2)

3-3: Properly added hardware counter import

The HardwareCounterCell import has been correctly added to support the measurement functionality.


29-30: Hardware counter correctly integrated into utility function

The hardware counter is properly initialized within the insert_distributed_vectors function and correctly passed to the insert_vector method. This enables I/O operation measurement during vector insertions in tests.

This change maintains consistency with the other modifications across the codebase for hardware measurement.

Also applies to: 36-36

lib/segment/src/index/hnsw_index/gpu/mod.rs (2)

97-97: Import of HardwareCounterCell added.

The addition of this import is necessary to support the new parameter in the insert_vector method call, which now requires a hardware counter.


152-156: Updated method call to include hardware counter.

The insert_vector method has been modified to include a new parameter for tracking hardware metrics during vector insertion operations. This change aligns with the PR objective to measure I/O writes for vector upsertions.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (7)

6-6: New import for hardware counter functionality.

The addition of the HardwareCounterCell import is necessary to support the hardware counter-enhanced vector operations.


38-38: Hardware counter instance created for test.

A new HardwareCounterCell instance is created to pass to the insert_vector method calls, enabling hardware metrics tracking during testing.


43-43: Updated method call with hardware counter parameter.

The insert_vector method call now includes the hardware counter, ensuring that I/O operations can be measured during vector insertions in tests.


129-129: Hardware counter instance for update_from_delete_points test.

Similar to the previous test, a hardware counter is created to enable metrics tracking during vector operations.


141-141: Updated method call for consistent hardware metrics tracking.

This change ensures that vector insertions during the update from delete points test are properly tracked for I/O metrics.


220-220: Hardware counter for persistence test.

Consistent addition of hardware counter to the persistence test function, maintaining the metric tracking capability across all test scenarios.


224-224: Updated method call in persistence test.

The final instance of insert_vector call update ensures consistent hardware metric tracking across all test cases.

lib/segment/src/index/sparse_index/sparse_vector_index.rs (4)

9-9: Added import for hardware counter functionality.

The import enables the use of hardware counters for tracking I/O operations during vector updates.


584-584: Updated method signature to include hardware counter.

The update_vector method now accepts a hw_counter parameter, allowing for metrics tracking during vector updates in the sparse index.


590-590: Vector storage insertion now tracks hardware metrics.

The insert_vector method call has been updated to pass the hardware counter, enabling I/O metrics tracking during vector insertion.


597-601: Default vector insertion now tracks hardware metrics.

When inserting a default vector, the operation now also tracks hardware metrics, ensuring consistent measurement across all vector operations.

lib/segment/src/fixtures/sparse_fixtures.rs (3)

7-7: Added import for hardware counter functionality.

This import supports the integration of hardware metrics tracking in the sparse index fixtures.


64-64: Created hardware counter instance for fixture setup.

A single hardware counter instance is created for tracking metrics during the vector insertion loop.


67-67: Updated vector insertion to track hardware metrics.

The insert_vector method call now includes the hardware counter parameter, ensuring consistent measurement of I/O operations during fixture setup.

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (8)

5-5: Adding necessary dependency for hardware metrics tracking.

Good addition of the HardwareCounterCell import to support the new metrics functionality.


33-34: Hardware counter initialization for vector operations.

Properly initializes the hardware counter before vector insertions to track I/O metrics.


37-38: Updated method call with hardware counter parameter.

Correctly passes the hardware counter to the insert_vector method, aligning with the PR objective to measure vector storage upsertions.


119-120: Hardware counter initialization in update test.

Good placement of the counter initialization before the vector operations.


134-135: Updated vector insertion with hardware counter.

Consistent pattern of updating vector insertion calls with hardware counter throughout the test suite.


192-193: Hardware counter for score points test.

Properly initializes the counter for scoring points test.


196-197: Updated vector insertion with metrics tracking.

Maintains consistency with other counter-enabled vector insertions.


267-271: Hardware counter for quantized points test.

Correctly implements hardware counter tracking in the quantized points test and properly updates the insertion method call.

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (6)

6-6: Adding hardware counter module import.

Good addition of the necessary import for measuring I/O performance.


60-61: Hardware counter initialization for multi-dense vectors.

Properly initializes the counter before vector operations in the delete points test.


65-66: Updated insertion with hardware counter tracking.

Correctly implements the updated method call to track vector insertion metrics.


188-189: Hardware counter for update operations.

Good initialization of the counter in the update from delete points test.


205-206: Consistent pattern of metrics tracking.

Maintains the same pattern of updating vector insertion calls with hardware counter tracking.


358-359: Hardware counter for large multi-dense vector test.

Properly implements counter initialization and updates the vector insertion call in the large vector test.

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (6)

388-389: Inline counter creation in F32 vector storage.

Creates a hardware counter inline during the vector insertion call, providing metrics tracking for GPU vector operations.


416-417: Consistent counter usage in F16 vector storage.

Maintains consistent pattern of passing hardware counters to insertion operations.


444-445: Hardware counter tracking for U8 vector storage.

Implements metrics tracking for byte vector insertions, consistent with other vector types.


485-486: Counter for multi-dense F32 vector operations.

Properly implements hardware counter tracking for multi-dense vector insertions.


532-533: Counter for multi-dense F16 vector operations.

Maintains consistency with other vector type implementations by adding hardware counter tracking.


579-580: Counter for multi-dense U8 vector operations.

Completes the pattern of adding hardware counter tracking to all vector insertion operations.

lib/segment/benches/sparse_vector_storage.rs (4)

6-6: Adding hardware counter dependency for benchmarks.

Good addition of the necessary import to support I/O metrics tracking in benchmarks.


32-33: Benchmark hardware counter initialization.

Properly initializes a single counter to be used across benchmark operations.


39-40: Updated vector insertion in RocksDB benchmark.

Correctly implements hardware counter tracking in the RocksDB insertion benchmark to measure I/O performance.


65-66: Updated vector insertion in MMap benchmark.

Maintains consistency by adding hardware counter tracking to MMap vector insertion benchmark.

lib/segment/benches/vector_search.rs (3)

6-6: Good import addition for the new parameter.

The HardwareCounterCell import is correctly added to support the new parameter in insert_vector.


41-41: Well-placed initialization of the hardware counter.

The hardware counter is appropriately initialized before the vector insertion loop.


47-47: Consistent parameter addition to method call.

The hardware counter parameter is correctly passed to the insert_vector method, matching the updated method signature.

lib/segment/src/index/plain_vector_index.rs (4)

5-5: Good import addition for the new parameter.

The HardwareCounterCell import is correctly added to support the updated method signature.


184-189: Correctly updated method signature.

The update_vector method signature has been properly updated to include the hardware counter parameter, which aligns with the changes across the codebase.


193-193: Properly forwarded hardware counter to underlying method.

The hardware counter is correctly passed to the insert_vector method when a vector is provided.


200-200: Consistent hardware counter forwarding for default vectors.

The hardware counter is also properly passed when inserting default vectors, maintaining consistency in both code paths.

lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2)

536-536: Good hardware counter initialization in test.

Hardware counter is properly initialized before being used in the test function.


541-541: Correct passing of hardware counter to insert_vector.

The hardware counter is correctly passed to the insert_vector method, aligning with the updated API.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6)

9-9: Good import addition for the new parameter.

The HardwareCounterCell import is correctly added to support the updated method signature.


178-185: Correctly updated method signature.

The insert_vector method signature has been properly updated to include the hardware counter parameter. The method still correctly panics with the same message since direct vector updates are not supported in this storage type.


322-322: Well-placed hardware counter initialization in test.

The hardware counter is properly initialized in the test function before being used in method calls.


337-338: Correct update to test method calls.

All insert_vector calls have been updated to include the hardware counter parameter, maintaining consistency with the updated API.

Also applies to: 340-341, 343-344


377-378: Consistent parameter passing in tests.

The hardware counter is correctly passed in these test method calls, following the pattern established throughout the file.

Also applies to: 380-381


433-433: Good pattern of hardware counter initialization in multiple tests.

All test functions have been updated with proper hardware counter initialization before using it in method calls, showing careful attention to consistent implementation.

Also applies to: 556-557, 637-638, 723-724

lib/segment/src/fixtures/index_fixtures.rs (1)

79-84: Implementation follows the trait's updated signature

The insert_vector method has been updated to include a hardware counter parameter, which aligns with changes across the codebase for measuring vector storage operations. The parameter is correctly prefixed with an underscore to indicate it's intentionally unused in this implementation.

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2)

465-465: Appropriate import added for test module

The HardwareCounterCell import has been correctly added to support the hardware counter usage in tests.


524-528: Vector insertion now tracks hardware operations

The insert_vector method call has been updated to include the hardware counter parameter, ensuring that GPU vector operations are properly measured. This change is consistent with the updated method signature throughout the codebase.

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (4)

8-8: Required import added

The HardwareCounterCell import has been correctly added to support hardware counter functionality.


113-125: Method signature updated to support hardware counter tracking

The insert_vector method has been updated to include a hardware counter parameter, which is then correctly passed to the underlying vector storage. This change enables tracking of I/O operations during vector insertions.


133-133: Appropriate use of disposable hardware counter

A disposable hardware counter is created for internal operations, which follows the pattern used elsewhere in the codebase. The comment clearly explains the purpose of this counter.


138-138: Hardware counter correctly passed to vector operation

The disposable hardware counter is properly passed to the push method, ensuring consistent tracking of I/O operations throughout the codebase.

lib/segment/src/segment/tests.rs (3)

516-517: Test updated to use hardware counter

The replace_all_vectors method call has been updated to include the hardware counter parameter, ensuring that tests match the updated method signatures.


531-532: Hardware counter correctly passed to vector operation

The hardware counter is properly passed to the vector replacement operation, maintaining consistency with the updated API.


655-655: Test validation flows updated with hardware counter

All vector operation method calls in the compatibility check tests have been updated to include the hardware counter parameter, ensuring comprehensive test coverage for the updated API.

Also applies to: 659-659, 663-663

lib/segment/src/index/vector_index_base.rs (3)

4-4: New import for hardware counter tracking

This import adds the necessary HardwareCounterCell type that will be used to track I/O operations during vector updates.


58-63: Updated method signature to include hardware counter tracking

The update_vector method in the VectorIndex trait now includes a hw_counter parameter which allows tracking I/O operations when updating vectors. This is consistent with the PR's objective to measure vector upsertions.


238-262: Implementation consistently forwards hardware counter to all variant implementations

The implementation of update_vector for VectorIndexEnum correctly passes the hardware counter to all variant implementations, ensuring consistent tracking across all vector index types.

tests/consensus_tests/test_hw_measurement.py (8)

4-5: Added import for vector update testing

The import now includes update_points_vector, which is essential for testing the vector I/O metrics being added in this PR.


29-29: Added vector I/O write assertion

This assertion verifies that vector I/O write metrics are being collected during the initial insertion phase.


34-52: Added tracking and validation for vector I/O metrics

This section calculates the expected vector I/O based on the number of vectors and their dimensions, and verifies that the reported metrics meet the expected thresholds. The logic for calculating expected delta is clear and accounts for the vector structure (dimensions and byte size).


60-61: Renamed variable and added vector update tracking

Good variable renaming from update_hw_data to update_payload_hw_data for clarity, and added update_vectors_hw_data to track vector update operations separately.


64-64: Added vector I/O metrics tracking and validation

These changes track the total vector I/O metrics and verify they meet the expected thresholds based on the vector structure.

Also applies to: 72-72, 78-78


81-82: Added validation for API response metrics

These assertions verify that the hardware data reported in the API response matches the data collected through telemetry, ensuring consistency between reporting mechanisms.


109-111: Added vector metrics tracking for the non-waiting test

These changes track vector I/O metrics in the test that doesn't wait for operations to complete, covering an important edge case.

Also applies to: 114-114


117-117: Added validation for total vector metrics

This assertion ensures that all vectors have been accounted for in the metrics, providing a comprehensive validation of the metrics collection.

lib/segment/src/vector_storage/vector_storage_base.rs (3)

6-6: Added import for hardware counter

This import brings in the HardwareCounterCell type that will be used for tracking I/O operations during vector storage operations.


64-69: Updated VectorStorage trait interface for hardware counter tracking

The insert_vector method in the VectorStorage trait now includes a hw_counter parameter for tracking I/O operations. This is the core interface change that enables the PR's objective of measuring vector upsertions.


594-645: Implementation consistently forwards hardware counter to all storage variants

The implementation of insert_vector for VectorStorageEnum correctly passes the hardware counter to all variant implementations, ensuring consistent tracking across all vector storage types. The implementation is thorough and handles all cases in the enum.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

214-224: Updated insert_vector to use provided hardware counter

The insert_vector method now accepts a hardware counter parameter and passes it to update_stored, replacing the previous approach of creating a disposable counter internally. This change aligns with the trait interface update and enables proper tracking of I/O operations during vector insertion.


304-304: Added hardware counter import in test module

The import in the test module ensures that the HardwareCounterCell type is available for testing.


358-359: Created hardware counter for test cases

A single HardwareCounterCell instance is created for use across all test cases, demonstrating the intended usage pattern.


363-371: Updated test calls to include hardware counter

All test calls to insert_vector now include the hardware counter parameter, ensuring that the tests properly validate the new function signature.


232-232:

Details

❓ Verification inconclusive

Note existing TODO comment for future hardware counter propagation

There's an existing TODO comment about propagating hardware counter values in the update_from method. This is likely outside the scope of the current PR but might be worth addressing in a future update.

Is this TODO planned to be addressed in a separate PR?


🏁 Script executed:

#!/bin/bash
# Search for similar TODO comments about hardware counter propagation
rg -A 1 "TODO\(io_measurement\)" --glob "*.rs"

Length of output: 1815


Hardware Counter Propagation – Confirmation Needed

The TODO on line 232 in lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs remains as a placeholder for propagating the hardware counter value. Our repository also contains similar TODO(io_measurement) comments in other areas (e.g., src/actix/api/update_api.rs and lib/collection/src/collection/sharding_keys.rs), which suggests that the hardware counter propagation is being tracked for a future update.

Could you please confirm if this functionality is planned to be implemented in a separate PR?

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (6)

7-7: Added import for hardware counter tracking.

The addition of the HardwareCounterCell import supports the PR's objective of measuring I/O writes during vector operations.


185-190: Updated method signature to include hardware counter.

The insert_vector method now accepts a hw_counter parameter, which is consistent with the pattern being applied across the codebase to track I/O operations during vector insertions.


228-233: Hardware counter correctly passed to underlying operation.

The hw_counter is now passed to the insert_many method call, enabling proper tracking of I/O operations at the vector storage level.


234-235: Hardware counter passed to offset insertion.

The hw_counter is now passed to the offset insertion operation, ensuring comprehensive tracking of I/O operations.


247-247: Created disposable hardware counter for internal operation.

Using a disposable counter for internal operations is a good practice, as it allows for separation between externally-tracked operations and internal implementation details.


253-253: Properly uses the disposable counter for internal vector insertion.

The disposable counter is correctly passed to the insert_vector method, maintaining consistent tracking behavior throughout the code.

lib/segment/src/vector_storage/chunked_vector_storage.rs (4)

3-3: Added import for hardware counter tracking.

The import is necessary for implementing the hardware counter tracking feature throughout the codebase.


24-28: Updated trait method signature to include hardware counter.

The push method signature in the ChunkedVectorStorage trait now includes a hw_counter parameter, standardizing the approach to hardware counter tracking across implementations.


30-35: Updated insert method to include hardware counter.

The method signature update ensures consistency in hardware counter usage across the interface.


37-43: Updated insert_many method to include hardware counter.

The signature change completes the pattern of adding hardware counter tracking to all vector insertion methods in the trait.

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (8)

6-6: Added import for hardware counter tracking.

The import is consistent with the pattern seen in other files and supports the PR's objective.


98-104: Updated method signature to include hardware counter.

The update_stored method now accepts a hw_counter parameter to track I/O operations during storage updates.


118-124: Implemented hardware counter increment for vector I/O writes.

This code properly tracks the serialized size of both the key and record being written to storage, which aligns with the goal of measuring I/O writes during vector operations.


203-208: Updated method signature to include hardware counter.

The insert_vector method signature now includes the hw_counter parameter, maintaining consistency with the rest of the codebase.


213-213: Hardware counter passed to underlying operation.

The hw_counter is correctly forwarded to the update_stored method, ensuring proper tracking at all levels.


223-223: Created disposable hardware counter for internal operations.

Using a disposable counter for internal update operations is appropriate since these operations don't need external tracking.


231-231: Correctly uses the disposable counter for internal storage updates.

The disposable counter is properly passed to update_stored, maintaining consistent tracking behavior.


248-253: Added disposable counter for delete operations.

Vector deletion operations also use a disposable counter, with a comment clarifying that deletions are not measured. This is a reasonable approach as the PR focuses specifically on measuring vector insertions/updates.

lib/segment/src/segment/segment_ops.rs (6)

48-53: Updated method signature to include hardware counter.

The replace_all_vectors method now accepts a hw_counter parameter, which is necessary for propagating the hardware counter through the call stack.


59-59: Hardware counter passed to underlying vector operation.

The hw_counter is correctly forwarded to the update_vector method, ensuring proper tracking at all levels.


78-83: Updated method signature to include hardware counter.

The update_vectors method now accepts a hw_counter parameter, consistent with the pattern applied throughout the codebase.


89-89: Hardware counter passed to underlying vector operation.

The hw_counter is properly forwarded to the update_vector method call, ensuring comprehensive tracking.


99-104: Updated method signature to include hardware counter.

The insert_new_vectors method signature now includes the hw_counter parameter, completing the pattern of hardware counter propagation through segment operations.


111-111: Hardware counter passed to underlying vector operation.

The hw_counter is correctly forwarded to the update_vector method, ensuring proper tracking at all levels.

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)

7-7: Added hardware counter import for vector I/O measurement.

The import of HardwareCounterCell is required for implementing the I/O write measurements for vector operations.


202-208: Method signature updated to include hardware counter parameter.

The update_stored method now accepts a hardware counter parameter, allowing for tracking of I/O operations during vector storage updates.


222-230: Implemented I/O write measurement using hardware counter.

This implementation correctly measures the serialized size of both the key and record when writing to storage by incrementing the vector I/O write counter with the combined size.


235-241: Method signature updated to include hardware counter parameter.

The insert_vector_impl method has been modified to accept the hardware counter parameter, which is then passed to downstream operations for consistent I/O measurement.


282-283: Updated method call to pass hardware counter parameter.

The call to update_stored now correctly passes the hardware counter parameter to ensure I/O measurements are captured during vector storage operations.


376-383: Public method signature updated to include hardware counter parameter.

The insert_vector method implementation now accepts the hardware counter parameter and passes it through to the internal implementation, ensuring consistent I/O measurement throughout the vector insertion pipeline.


396-401: Using disposable hardware counter for internal operations.

The code correctly uses a disposable hardware counter for internal operations during update_from since these operations don't need to be measured externally.


418-420: Deliberate exclusion of deletion operations from measurement.

The comment and implementation clearly indicate that deletion operations are intentionally not measured in the hardware counter, using a disposable counter instead.

lib/segment/src/segment/entry.rs (6)

111-117: Renamed parameter to indicate active usage.

The parameter was renamed from _hw_counter (indicating it was unused) to hw_counter (without underscore), reflecting that it's now being actively used in the implementation.


124-125: Passing hardware counter to vector replacement operation.

The hardware counter is now correctly passed to the replace_all_vectors method, enabling I/O write measurement during vector replacement operations.


127-128: Passing hardware counter to vector insertion operation.

The hardware counter is now correctly passed to the insert_new_vectors method, enabling I/O write measurement during new vector insertion operations.


170-176: Renamed parameter to indicate active usage.

The parameter was renamed from _hw_counter (indicating it was unused) to hw_counter (without underscore), reflecting that it's now being actively used in the implementation.


186-187: Passing hardware counter to vector update operation.

The hardware counter is now correctly passed to the update_vectors method, enabling I/O write measurement during vector update operations.


198-199: TODO comment for future hardware counter implementation.

The comment clearly indicates that hardware counter values for the delete_vector method are planned but not yet implemented, consistent with the approach in other files where deletions are not measured.

Have you considered implementing the hardware counter for the delete_vector method in a future PR? The comment suggests this is planned.

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (4)

3-3: Added hardware counter import for vector I/O measurement.

The import of HardwareCounterCell is required for implementing the I/O write measurements for vector operations.


57-64: Method signature updated to include hardware counter parameter.

The push method now accepts a hardware counter parameter and passes it to the underlying storage implementation, ensuring consistent I/O measurement.


67-74: Method signature updated to include hardware counter parameter.

The insert method now accepts a hardware counter parameter and passes it to the underlying storage implementation, ensuring consistent I/O measurement.


77-86: Method signature updated to include hardware counter parameter.

The insert_many method now accepts a hardware counter parameter and passes it to the underlying storage implementation, ensuring consistent I/O measurement.

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (11)

5-5: Added hardware counter import for vector I/O measurement.

The import of HardwareCounterCell is required for implementing the I/O write measurements for vector operations.


194-201: Method signature updated to include hardware counter parameter.

The insert method now accepts a hardware counter parameter and passes it to the insert_many method, ensuring consistent I/O measurement throughout the vector insertion pipeline.


203-210: Method signature updated to include hardware counter parameter.

The insert_many method now accepts a hardware counter parameter to enable I/O write measurement.


256-263: Method signature updated to include hardware counter parameter.

The push method now accepts a hardware counter parameter and passes it to the insert method, ensuring consistent I/O measurement throughout the vector insertion pipeline.


361-367: Trait implementation updated to include hardware counter parameter.

The push method implementation in the ChunkedVectorStorage trait now accepts and forwards the hardware counter parameter correctly.


370-377: Trait implementation updated to include hardware counter parameter.

The insert method implementation in the ChunkedVectorStorage trait now accepts and forwards the hardware counter parameter correctly.


380-388: Trait implementation updated to include hardware counter parameter.

The insert_many method implementation in the ChunkedVectorStorage trait now accepts and forwards the hardware counter parameter correctly.


434-435: Added hardware counter for test cases.

A hardware counter instance is created for use in the unit tests, ensuring the new parameter requirements are properly tested.


451-452: Updated test method call to include hardware counter parameter.

The test case now correctly passes the hardware counter when calling the push method.


484-491: Updated test method calls to include hardware counter parameter.

The test cases now correctly pass the hardware counter when calling the insert method at multiple points, ensuring thorough test coverage of the new parameter.


237-240:

Details

❓ Verification inconclusive

Implemented I/O write measurement using hardware counter.

This is where the actual measurement happens - incrementing the vector I/O write counter with the size of the written vectors.


🏁 Script executed:

#!/bin/bash
# Verify that the size_of_val function is properly imported
rg "use std::mem::size_of_val" lib/segment/src/vector_storage/chunked_mmap_vectors.rs

Length of output: 87


Attention: Confirm the Import of size_of_val

The implementation correctly increments the vector I/O write counter with the size of the written vectors. However, the verification script for checking a direct import of std::mem::size_of_val in the file produced no output. Please ensure that one of the following is true:

  • The file explicitly imports size_of_val using a statement like
    use std::mem::size_of_val;
  • Alternatively, a glob import (e.g., use std::mem::*;) is present either in this file or in a module that guarantees size_of_val is in scope.

If neither is the case, the build may fail. Kindly verify manually (or update the file) to ensure that size_of_val is accessible in lib/segment/src/vector_storage/chunked_mmap_vectors.rs at lines 237–240.

        hw_counter
            .vector_io_write_counter()
            .incr_delta(size_of_val(vectors));
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

8-8: No concerns with the new import
The import of HardwareCounterCell is straightforward and necessary for tracking hardware counters in this file.


250-263: Verify whether internal upserts need to be measured
A disposable counter is used here because this method is for internal operations, yet it might be beneficial to include these upserts in I/O measurements for a full picture. Confirm if the decision to skip them is intentional.


279-280: Skipping deletion measurements
Invoking update_stored with a disposable counter indicates that deletion is deliberately not measured. This approach is consistent with your design choice to focus on insertion/upsert metrics.

@JojiiOfficial JojiiOfficial merged commit bf47c97 into dev Mar 10, 2025
17 checks passed
@JojiiOfficial JojiiOfficial deleted the measure_vector_insert_io_writes branch March 10, 2025 14:27
timvisee pushed a commit that referenced this pull request Mar 21, 2025
* Add counter to vector storage API

* Measuring IO write for dense vectors

* Tests, GPU and sparse

* Measure multi vectors too

* Fix comment
@coderabbitai coderabbitai bot mentioned this pull request Mar 18, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants