Skip to content

[Native] Use vdotq_s32 for int7u/int8 distances on ARM#144505

Merged
ldematte merged 5 commits intoelastic:mainfrom
ldematte:native/arm-dotprod-v2
Mar 19, 2026
Merged

[Native] Use vdotq_s32 for int7u/int8 distances on ARM#144505
ldematte merged 5 commits intoelastic:mainfrom
ldematte:native/arm-dotprod-v2

Conversation

@ldematte
Copy link
Copy Markdown
Contributor

@ldematte ldematte commented Mar 18, 2026

Replace vmull_s8 + vpadalq_s16 with vdotq_s32 in int7/int8 native implementation on ARM.

Benchmarks:

Single-vector (ops/us, dims=1024)

Benchmark Function Baseline vdotq_s32 Change
score DOT_PRODUCT 18.63 26.08 +40%
scoreQuery DOT_PRODUCT 17.30 26.43 +53%
score COSINE 18.49 26.62 +44%
scoreQuery COSINE 17.61 25.38 +44%

Bulk (ops/s, dims=1024, numVectors=1500, bulkSize=32)

Benchmark Function Baseline vdotq_s32 Change
scoreMultipleRandomBulk DOT_PRODUCT 13,628 19,705 +45%
scoreMultipleRandomBulk COSINE 13,831 20,127 +46%
scoreMultipleSequentialBulk DOT_PRODUCT 14,183 21,455 +51%
scoreMultipleSequentialBulk COSINE 13,982 21,683 +55%
scoreQueryMultipleRandomBulk DOT_PRODUCT 14,251 20,664 +45%
scoreQueryMultipleRandomBulk COSINE 13,554 21,818 +61%

Replace vmull_s8 + vpadalq_s16 with vdotq_s32 in doti8_inner,
cosi8_inner, cosi8_inner_bulk, and call_i8_bulk (dot path).
Make call_i8_bulk accumulator type parametric via acc_ops traits.
Bump -march to armv8.2-a+dotprod and add HWCAP_ASIMDDP runtime
check in caps.cpp.

Made-with: Cursor
@ldematte ldematte requested a review from a team as a code owner March 18, 2026 16:05
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0 labels Mar 18, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @ldematte, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ldematte ldematte merged commit cc5db9d into elastic:main Mar 19, 2026
36 checks passed
@ldematte ldematte deleted the native/arm-dotprod-v2 branch March 20, 2026 07:41
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Replace vmull_s8 + vpadalq_s16 with vdotq_s32 in int7/int8 native implementation on ARM.
Benchmarks show a 40 to 60% speedup in scoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants