Add JDK22+ heap-backed native vector scorer suppliers#142812
Add JDK22+ heap-backed native vector scorer suppliers#142812arup-chauhan wants to merge 2 commits intoelastic:mainfrom
Conversation
benwtrent
left a comment
There was a problem hiding this comment.
- please benchmark
- bulk scoring actually needs to be bulk scoring
| } | ||
|
|
||
| @Override | ||
| HeapByteVectorScorerSupplier copyInternal() { |
There was a problem hiding this comment.
why doesn't this just override copy directly?
There was a problem hiding this comment.
@thecoop I removed the copyInternal() indirection and now each concrete heap supplier overrides copy() directly
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
@benwtrent thanks, this is addressed now. Bulk scoring is now truly bulk. I updated the heap-backed path so It now packs the selected vectors into a contiguous buffer, calls the native bulk functions, and then applies the similarity-specific normalization step. I also ran a focused indexing benchmark with
While running this, I also found a bug in ordinal handling during incremental HNSW build (we were effectively treating |
Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>
Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>
ae0440b to
50955d4
Compare
|
Hello @arup-chauhan, thanks for the benchmarks. Can you add some more details on how you run them? |
|
Hey @ldematte, thanks for checking this. You’re right that my run is not directly comparable to yours. Here is exactly what I ran: Command: Config: Results (indexing only):
Here is my hardware:
This is BYTE vectors, 128 dims, 100k docs, and search/query was not executed in this config (search metrics are zero). So this does not cover float32 + higher dimensions, where behavior can differ significantly. I agree ARM + larger float vectors may change the outcome materially. |
|
@arup-chauhan isn't it obvious that you need to benchmark with |
Description
This PR implements an Elasticsearch-first fix for #142379 by enabling native vector scorer suppliers during the array-backed phase of HNSW graph building (JDK 22+), while preserving existing off-heap paths and Lucene fallback behavior.
Context from issue discussion:
vectorValue), so the existing index-slice/off-heap native supplier path is not always used.MemorySegmentsupplier path in Elasticsearch first, as requested in the issue thread.Changes
VectorScorerFactoryAPI for array-backed suppliers:getFloatVectorScorerSupplier(VectorSimilarityType, FloatVectorValues)getByteVectorScorerSupplier(VectorSimilarityType, ByteVectorValues)simdvec:VectorScorerFactoryImplnow returns heap-backed suppliers for array-backed values.ES93FlatVectorScorernow tries:FloatVectorScorerFactoryTests.testArrayBackedRandomSupplierByteVectorScorerFactoryTests.testArrayBackedRandomSupplierBehavior / Safety
Runtime.version().feature() >= 22).Validation
Ran with runtime JDK 25 (JDK22+ path active):
All above commands completed successfully.