Add BULK_SPARSE native vector operations for BBQ and INT4#145676
Merged
ChrisHegarty merged 23 commits intoelastic:mainfrom Apr 3, 2026
Merged
Add BULK_SPARSE native vector operations for BBQ and INT4#145676ChrisHegarty merged 23 commits intoelastic:mainfrom
ChrisHegarty merged 23 commits intoelastic:mainfrom
Conversation
Add a TData template parameter to the BBQ (dotd1q4, dotd2q4, dotd4q4) and INT4 (doti4) bulk scoring templates on both amd64 and aarch64 tier-1. This aligns them with call_i8_bulk in vec_1.cpp, which already uses TData to support sequential_mapper, offsets_mapper, and sparse_mapper through the same template. No functional change — existing sequential and offsets instantiations are updated to pass int8_t as TData explicitly.
… bbq_int4_template_refactor
Add vec_dot{d1q4,d2q4,d4q4,i4}_bulk_sparse native exports on amd64 and
aarch64 tier-1, using the TData/sparse_mapper template instantiation
introduced in a previous PR. Enable BULK_SPARSE for BBQ and INT4 in
JdkVectorLibrary with appropriate bounds checking, and add corresponding
method handles and wrapper methods in Similarities.
Includes unit tests for both BBQ and INT4 bulk sparse operations:
contiguous slices, scattered (non-contiguous) allocations, and illegal
argument validation.
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
ldematte
approved these changes
Apr 3, 2026
| if (f != Function.DOT_PRODUCT && type == DataType.INT4) continue; | ||
| // BULK_SPARSE only for INT7U and INT8 — no native sparse functions exist for FLOAT32 or INT4 | ||
| if (op == Operation.BULK_SPARSE && (type == DataType.FLOAT32 || type == DataType.INT4)) continue; | ||
| // BULK_SPARSE only for INT7U, INT8, and INT4 — no native sparse functions exist for FLOAT32 |
Contributor
There was a problem hiding this comment.
I think this is a gap we need to close, we should have BULK_SPARSE for all, but I think it's better to address it in a separate PR.
Contributor
There was a problem hiding this comment.
Also, I thought we also had BFLOAT16 but maybe these are wired differently.. worth a check.
libs/native/src/test/java/org/elasticsearch/nativeaccess/jdk/JDKVectorLibraryInt4Tests.java
Show resolved
Hide resolved
mromaios
pushed a commit
to mromaios/elasticsearch
that referenced
this pull request
Apr 9, 2026
…5676) This PR adds BULK_SPARSE native exports for BBQ (d1q4, d2q4, d4q4) and packed INT4 vector dot-product operations on both amd64 and aarch64. This fills out BULK_SPARSE support for these element types, consistent with INT7U and INT8 which already have sparse operations. Unlike BULK_OFFSETS, which requires all vectors to reside in a single contiguous memory region, BULK_SPARSE accepts an array of independent memory addresses, one per vector. This enables efficient bulk scoring over scatter-gather data, such as when vectors are backed by DirectAccessInput with its 16MiB region boundaries. The new native functions use the sparse_mapper with the generalized TData bulk templates introduced in elastic#145459. JdkVectorLibrary is updated to enable BULK_SPARSE for BBQ and INT4 with appropriate bounds checking, and Similarities gains corresponding method handles and Java wrapper methods. Unit tests cover contiguous slices, scattered (non-contiguous) allocations, and illegal argument validation for both BBQ (parameterized across d1q4, d2q4, d4q4) and INT4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds BULK_SPARSE native exports for BBQ (d1q4, d2q4, d4q4) and packed INT4 vector dot-product operations on both amd64 and aarch64. This fills out BULK_SPARSE support for these element types, consistent with INT7U and INT8 which already have sparse operations.
Unlike BULK_OFFSETS, which requires all vectors to reside in a single contiguous memory region, BULK_SPARSE accepts an array of independent memory addresses, one per vector. This enables efficient bulk scoring over scatter-gather data, such as when vectors are backed by DirectAccessInput with its 16MiB region boundaries.
The new native functions use the
sparse_mapperwith the generalizedTDatabulk templates introduced in #145459. JdkVectorLibrary is updated to enableBULK_SPARSEfor BBQ and INT4 with appropriate bounds checking, andSimilaritiesgains corresponding method handles and Java wrapper methods. Unit tests cover contiguous slices, scattered (non-contiguous) allocations, and illegal argument validation for both BBQ (parameterized across d1q4, d2q4, d4q4) and INT4.