Skip to content

Add volatile sparse vector storage#6569

Merged
timvisee merged 4 commits intodevfrom
vector-storage-sparse-volatile-test
May 21, 2025
Merged

Add volatile sparse vector storage#6569
timvisee merged 4 commits intodevfrom
vector-storage-sparse-volatile-test

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented May 21, 2025

Depends on #6564

Same as #6564, but for sparse vectors.

Tasks

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

@timvisee timvisee added this to the Remove RocksDB milestone May 21, 2025
@timvisee timvisee force-pushed the vector-storage-multi-dense-volatile-test branch from 7bc5ead to 3897979 Compare May 21, 2025 12:08
Base automatically changed from vector-storage-multi-dense-volatile-test to dev May 21, 2025 12:35
@timvisee timvisee force-pushed the vector-storage-sparse-volatile-test branch from e20531c to 0e50337 Compare May 21, 2025 12:36
@timvisee timvisee marked this pull request as ready for review May 21, 2025 12:36
@coderabbitai

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
lib/segment/src/vector_storage/vector_storage_base.rs (1)

245-246: Boiler-plate explosion – consider macro-driven enum dispatch

Adding SparseVolatile required touching every single match arm in this file.
The pattern has already repeated for the dense & multi-dense volatile variants and is becoming hard to maintain.

A small macro_rules! helper (or enum_dispatch/paste crates) that generates the match arms based on a list of variants would:

  • eliminate this error-prone duplication,
  • make future variants cheaper to add,
  • keep the file readable.

This is an optional refactor, but it will pay off quickly given the pace of new storages.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1cf4b62 and 0e50337.

📒 Files selected for processing (10)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1 hunks)
  • lib/segment/src/types.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/vector_storage_base.rs (22 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
  • new_volatile_sparse_vector_storage (34-36)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (10)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

291-292: Correct handling of the new SparseVolatile variant.

The new match arm correctly handles the VectorStorageEnum::SparseVolatile variant by returning Err(OperationError::WrongSparse), consistent with other sparse vector storage variants. The #[cfg(test)] attribute ensures this code only appears in test builds.

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)

396-399: Consistent error handling for SparseVolatile on GPU.

The added match arm for SparseVolatile correctly returns an error indicating that sparse vectors are not supported on GPU, consistent with how other sparse vector storage types are handled. The #[cfg(test)] attribute appropriately limits this code to test builds.

lib/segment/src/vector_storage/sparse/mod.rs (1)

4-5: Well-structured module exposure for test builds.

Adding the volatile_sparse_vector_storage module with public visibility and conditional compilation for tests is appropriate. This follows the same pattern used for other volatile storage implementations in the codebase, making it available only for test scenarios.

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)

94-95: Appropriate match arm for unreachable case.

The addition of the SparseVolatile match arm marked as unreachable!() is correct, as this test is specifically for multi dense vector storage and shouldn't encounter sparse storage variants. This maintains consistency with how other sparse storage types are handled in this match statement.

lib/segment/src/vector_storage/raw_scorer.rs (1)

104-105: Properly implements scoring for the new SparseVolatile vector storage variant

The implementation correctly adds support for the new SparseVolatile variant in the vector storage enum, following the same pattern as other sparse vector storage types by delegating to the raw_sparse_scorer_impl function. The #[cfg(test)] attribute properly ensures this code is only compiled during test builds, consistent with other volatile storage implementations in this file.

docs/redoc/master/openapi.json (2)

12018-12018: Refined on_disk description for clarity

The description now explicitly mentions RocksDB storage, matching the underlying implementation and improving documentation accuracy.


12025-12025: Refined mmap description for clarity

The description now explicitly mentions GridStore (memory-mapped) storage, aligning docs with code behavior.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)

19-19: Import added for the new volatile sparse vector storage functionality.

This import is correctly added to use the new volatile sparse vector storage implementation.


136-136: Successfully integrated volatile sparse vector storage for testing.

The test has been updated to use the new new_volatile_sparse_vector_storage() function instead of the previous storage implementation. This change appropriately tests the newly added in-memory volatile sparse vector storage implementation, which is the main objective of this PR.

lib/segment/src/types.rs (1)

1384-1384: Improved documentation clarity for sparse vector storage types.

The documentation for SparseVectorStorageType enum variants has been enhanced by specifying the backend technology for each storage type:

  • OnDisk: Clarified that it uses "rocksdb storage"
  • Mmap: Clarified that it uses "gridstore storage"

This provides better context about the underlying implementation of each storage option.

Also applies to: 1386-1386

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee merged commit 4581c56 into dev May 21, 2025
17 checks passed
@timvisee timvisee deleted the vector-storage-sparse-volatile-test branch May 21, 2025 14:07
generall pushed a commit that referenced this pull request May 22, 2025
* Add volatile sparse vector storage

* Add and use volatile sparse vector test

* Update OpenAPI spec

* Resolve review remarks
n0x29a pushed a commit that referenced this pull request May 23, 2025
* Add volatile sparse vector storage

* Add and use volatile sparse vector test

* Update OpenAPI spec

* Resolve review remarks
@coderabbitai coderabbitai bot mentioned this pull request Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants