Skip to content

Batched scans for dense vectors#7514

Merged
monoid merged 20 commits intodevfrom
feat/batched-scans
Nov 20, 2025
Merged

Batched scans for dense vectors#7514
monoid merged 20 commits intodevfrom
feat/batched-scans

Conversation

@monoid
Copy link
Contributor

@monoid monoid commented Nov 11, 2025

Previously each element of batch request was processed separately. This change handles them together in case of linear scans, exploiting memory caching to increase throughput by 2.8x.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@monoid monoid self-assigned this Nov 11, 2025
@monoid monoid changed the title WIP: batched scans Batched scans Nov 13, 2025
@monoid monoid changed the title Batched scans Batched scans for dense vectors Nov 14, 2025
@monoid monoid marked this pull request as ready for review November 14, 2025 08:43
coderabbitai[bot]

This comment was marked as resolved.

@generall
Copy link
Member

@monoid please resolve merge conflicts

@monoid
Copy link
Contributor Author

monoid commented Nov 14, 2025

@monoid please resolve merge conflicts

Done.

@monoid monoid force-pushed the feat/batched-scans branch from e0505f4 to 4dbfe82 Compare November 14, 2025 15:22
@agourlay
Copy link
Member

exploiting memory caching to increase throughput by 2.8x.

What is exactly the benchmark setup here? Micro benchmark or end to end?

@monoid
Copy link
Contributor Author

monoid commented Nov 17, 2025

exploiting memory caching to increase throughput by 2.8x.

What is exactly the benchmark setup here? Micro benchmark or end to end?

It was a bfb benchmarks with 8-16 search threads and batch search. But it was an edge-case: full linear scan.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 18, 2025
@monoid
Copy link
Contributor Author

monoid commented Nov 18, 2025

For a more realistic example, bfb -d 128 --keywords 10 --search-batch-size 50 gives 2x speedup.

coderabbitai[bot]

This comment was marked as resolved.

@monoid monoid force-pushed the feat/batched-scans branch from 4a84977 to 82842b6 Compare November 19, 2025 12:34
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 19, 2025
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems handled. Thanks! 🙌

coderabbitai[bot]

This comment was marked as resolved.

@monoid monoid force-pushed the feat/batched-scans branch 2 times, most recently from 86f597d to 1bfb0f8 Compare November 19, 2025 17:05
BatchFilteredScorer is the correct name.
@monoid monoid force-pushed the feat/batched-scans branch from 1bfb0f8 to 6918419 Compare November 19, 2025 17:08
Copy link
Contributor

@IvanPleshkov IvanPleshkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this commit
6918419
the topic about deleted flags
#7514 (comment)
is still actual, we dont use deleted flag correctly in peek_top_all. Mark back as changes request

@monoid
Copy link
Contributor Author

monoid commented Nov 19, 2025

After this commit 6918419 the topic about deleted flags #7514 (comment) is still actual, we dont use deleted flag correctly in peek_top_all. Mark back as changes request

This commit is incorrect, I'm fixing it.

However, in postprocessing deleted point is not actually used. It is a required argument for FilteredScorer, but is used in peek_* methods which will be eventually removed.

@monoid monoid force-pushed the feat/batched-scans branch 2 times, most recently from 938eb0b to 60745bc Compare November 19, 2025 18:42
@monoid monoid force-pushed the feat/batched-scans branch from 60745bc to c566564 Compare November 19, 2025 18:46
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@monoid monoid merged commit 428f1e7 into dev Nov 20, 2025
21 of 22 checks passed
@monoid monoid deleted the feat/batched-scans branch November 20, 2025 10:34
@qdrant qdrant deleted a comment from coderabbitai bot Nov 20, 2025
timvisee pushed a commit that referenced this pull request Nov 25, 2025
* Batched iteration for plain `HNSWIndex` searches
* Batched iteration for `PlainVectorIndex` search
@timvisee timvisee mentioned this pull request Nov 25, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants