Skip to content

Add BULK_SPARSE native vector operations for BBQ and INT4#145676

Merged
ChrisHegarty merged 23 commits intoelastic:mainfrom
ChrisHegarty:bbq_int4_bulk_sparse
Apr 3, 2026
Merged

Add BULK_SPARSE native vector operations for BBQ and INT4#145676
ChrisHegarty merged 23 commits intoelastic:mainfrom
ChrisHegarty:bbq_int4_bulk_sparse

Conversation

@ChrisHegarty
Copy link
Copy Markdown
Contributor

This PR adds BULK_SPARSE native exports for BBQ (d1q4, d2q4, d4q4) and packed INT4 vector dot-product operations on both amd64 and aarch64. This fills out BULK_SPARSE support for these element types, consistent with INT7U and INT8 which already have sparse operations.

Unlike BULK_OFFSETS, which requires all vectors to reside in a single contiguous memory region, BULK_SPARSE accepts an array of independent memory addresses, one per vector. This enables efficient bulk scoring over scatter-gather data, such as when vectors are backed by DirectAccessInput with its 16MiB region boundaries.

The new native functions use the sparse_mapper with the generalized TData bulk templates introduced in #145459. JdkVectorLibrary is updated to enable BULK_SPARSE for BBQ and INT4 with appropriate bounds checking, and Similarities gains corresponding method handles and Java wrapper methods. Unit tests cover contiguous slices, scattered (non-contiguous) allocations, and illegal argument validation for both BBQ (parameterized across d1q4, d2q4, d4q4) and INT4.

ChrisHegarty and others added 16 commits April 1, 2026 16:44
Add a TData template parameter to the BBQ (dotd1q4, dotd2q4, dotd4q4)
and INT4 (doti4) bulk scoring templates on both amd64 and aarch64 tier-1.
This aligns them with call_i8_bulk in vec_1.cpp, which already uses TData
to support sequential_mapper, offsets_mapper, and sparse_mapper through
the same template. No functional change — existing sequential and offsets
instantiations are updated to pass int8_t as TData explicitly.
Add vec_dot{d1q4,d2q4,d4q4,i4}_bulk_sparse native exports on amd64 and
aarch64 tier-1, using the TData/sparse_mapper template instantiation
introduced in a previous PR. Enable BULK_SPARSE for BBQ and INT4 in
JdkVectorLibrary with appropriate bounds checking, and add corresponding
method handles and wrapper methods in Similarities.

Includes unit tests for both BBQ and INT4 bulk sparse operations:
contiguous slices, scattered (non-contiguous) allocations, and illegal
argument validation.
@ChrisHegarty ChrisHegarty requested a review from ldematte April 3, 2026 11:06
@ChrisHegarty ChrisHegarty added >refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Apr 3, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ChrisHegarty ChrisHegarty added the test-arm Pull Requests that should be tested against arm agents label Apr 3, 2026
Copy link
Copy Markdown
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

if (f != Function.DOT_PRODUCT && type == DataType.INT4) continue;
// BULK_SPARSE only for INT7U and INT8 — no native sparse functions exist for FLOAT32 or INT4
if (op == Operation.BULK_SPARSE && (type == DataType.FLOAT32 || type == DataType.INT4)) continue;
// BULK_SPARSE only for INT7U, INT8, and INT4 — no native sparse functions exist for FLOAT32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a gap we need to close, we should have BULK_SPARSE for all, but I think it's better to address it in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I thought we also had BFLOAT16 but maybe these are wired differently.. worth a check.

@ChrisHegarty ChrisHegarty merged commit 8c52ed7 into elastic:main Apr 3, 2026
41 checks passed
@ChrisHegarty ChrisHegarty deleted the bbq_int4_bulk_sparse branch April 3, 2026 18:07
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
…5676)

This PR adds BULK_SPARSE native exports for BBQ (d1q4, d2q4, d4q4) and packed INT4 vector dot-product operations on both amd64 and aarch64. This fills out BULK_SPARSE support for these element types, consistent with INT7U and INT8 which already have sparse operations.

Unlike BULK_OFFSETS, which requires all vectors to reside in a single contiguous memory region, BULK_SPARSE accepts an array of independent memory addresses, one per vector. This enables efficient bulk scoring over scatter-gather data, such as when vectors are backed by DirectAccessInput with its 16MiB region boundaries.

The new native functions use the sparse_mapper with the generalized TData bulk templates introduced in elastic#145459. JdkVectorLibrary is updated to enable BULK_SPARSE for BBQ and INT4 with appropriate bounds checking, and Similarities gains corresponding method handles and Java wrapper methods. Unit tests cover contiguous slices, scattered (non-contiguous) allocations, and illegal argument validation for both BBQ (parameterized across d1q4, d2q4, d4q4) and INT4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-arm Pull Requests that should be tested against arm agents v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants