Skip to content

Add optimistic collection to DiversifyingNearestChildren* vector queries #15063

Merged
benwtrent merged 6 commits intoapache:mainfrom
benwtrent:feature/opt-knn-collector-all-the-things
Aug 25, 2025
Merged

Add optimistic collection to DiversifyingNearestChildren* vector queries #15063
benwtrent merged 6 commits intoapache:mainfrom
benwtrent:feature/opt-knn-collector-all-the-things

Conversation

@benwtrent
Copy link
Member

@benwtrent benwtrent commented Aug 13, 2025

Optimistic knn search addresses a major issue where we return inconsistent results due to race conditions in the shared queue previously used over multi-segment search.

It not only returns consistent results, but is generally better overall when it comes to recall & latency.

This PR refactors the optimistic querying in AbstractKnnVectorQuery so that more sub-queries can take advantage of this logic.

Additionally, this refactors PatienceKnnVectorQuery to better fit this API and utilize the underlying multi-segment search logic.

Here are the results for nested vs. baseline. Note how baseline recall fluctuates, where it is consistent in candidate.

closes: #15059

BASELINE

recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.628        2.650   9.980        3.766  1000000   100     250       64        250         no        0      0.00      Infinity             7         2948.33      2929.688     2929.688       HNSW
 0.629        2.710  10.080        3.720  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.629        2.750  10.200        3.709  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.629        2.720  10.150        3.732  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.628        2.680  10.070        3.757  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.629        2.760  10.200        3.696  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW

Here is baseline with a larger fanout to get similar recall as candidate

recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
0.722        5.157  20.486        3.972  1000000   100     600       64        250         no      0.00      Infinity             0            0.00         0.000        0.000       HNSW

Much higher latency and netCPU time.

CANDIDATE

recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.732        3.100  14.420        4.652  1000000   100     250       64        250         no        0      0.00      Infinity             7         2948.33      2929.688     2929.688       HNSW
 0.732        3.140  14.570        4.640  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.732        3.120  14.460        4.635  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.732        3.130  14.460        4.620  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.732        3.140  14.460        4.605  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW
 0.732        3.110  14.410        4.633  1000000   100     250       64        250         no        0      0.00      Infinity             0            0.00         0.000        0.000       HNSW

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@msokolov
Copy link
Contributor

The refactor seems more general and cleaner, thanks, but I think in the test you ran we are seeing a latency increase. Personally I think it's OK given the recall increase and better consistency, but the data is a bit at odds with the statement in the PR description.

@benwtrent
Copy link
Member Author

Personally I think it's OK given the recall increase and better consistency, but the data is a bit at odds with the statement in the PR description.

Its an increase in latency with an increase in recall. Let me adjust the test so that we get similar recall on both.

Regardless, even with a slight latency increase, fixing the inconsistency takes precedence (IMO).

* GITHUB#14226: Use optimistic-with-checking KNN Query execution strategy in place of cross-thread
global queue min-score checking. Improves performance and consistency. (Mike Sokolov, Dzung Bui,
Ben Trent)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that its actually in 10x for 10.3 release, but the change note was only in v11.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing!

@github-actions github-actions bot modified the milestones: 10.3.0, 11.0.0 Aug 13, 2025
@benwtrent benwtrent modified the milestones: 11.0.0, 10.3.0 Aug 14, 2025
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx Ben, LGTM

@github-actions github-actions bot modified the milestones: 10.3.0, 11.0.0 Aug 25, 2025
@benwtrent benwtrent modified the milestones: 11.0.0, 10.3.0 Aug 25, 2025
@benwtrent benwtrent merged commit b6e57da into apache:main Aug 25, 2025
8 checks passed
@benwtrent benwtrent deleted the feature/opt-knn-collector-all-the-things branch August 25, 2025 16:44
benwtrent added a commit that referenced this pull request Aug 25, 2025
…ies (#15063)

Optimistic knn search addresses a major issue where we return inconsistent results due to race conditions in the shared queue previously used over multi-segment search.

It not only returns consistent results, but is generally better overall when it comes to recall & latency.

This PR refactors the optimistic querying in AbstractKnnVectorQuery so that more sub-queries can take advantage of this logic.

Additionally, this refactors PatienceKnnVectorQuery to better fit this API and utilize the underlying multi-segment search logic.

Here are the results for nested vs. baseline. Note how baseline recall fluctuates, where it is consistent in candidate.

closes: #15059
mayya-sharipova added a commit to mayya-sharipova/lucene that referenced this pull request Feb 9, 2026
These classes are no longer used after PR apache#14226 and apache#15063
switched KNN queries to optimistic collection strategy.

The previous approach used MultiLeafKnnCollector with a shared
BlockingFloatHeap to exchange top scores across segments during
concurrent search. This caused non-deterministic results due to
race conditions in score synchronization between threads.
mayya-sharipova added a commit that referenced this pull request Feb 9, 2026
These classes are no longer used after PR #14226 and #15063
switched KNN queries to optimistic collection strategy.

The previous approach used MultiLeafKnnCollector with a shared
BlockingFloatHeap to exchange top scores across segments during
concurrent search. This caused non-deterministic results due to
race conditions in score synchronization between threads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add optimistic knn search to other vector query types

3 participants