Fix bulk scoring to process last batch instead of falling through to scalar tail#145316
Merged
ldematte merged 4 commits intoelastic:mainfrom Mar 31, 2026
Merged
Fix bulk scoring to process last batch instead of falling through to scalar tail#145316ldematte merged 4 commits intoelastic:mainfrom
ldematte merged 4 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
thecoop
reviewed
Mar 31, 2026
thecoop
approved these changes
Mar 31, 2026
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Mar 31, 2026
…rics
* upstream/main: (21 commits)
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
[DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
Use checked exceptions in entitlement constructor rules (elastic#145234)
ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
Make Settings.Builder.remove() fluent (elastic#145294)
Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
Fix flaky SecurityFeatureResetTests (elastic#145063)
[DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
Skip certain metric assertions on Windows (elastic#144933)
[ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
[Transform] Track Linked Projects (elastic#144399)
Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
Clean up TickerScheduleEngineTests (elastic#145303)
[CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
...
ncordon
pushed a commit
to ncordon/elasticsearch
that referenced
this pull request
Apr 1, 2026
…scalar tail (elastic#145316) This PR fixes a small issue in bulk scoring functions where the last batch of vectors was unnecessarily dropped to the single-vector tail loop. Bulk loops used c + 2 * batches - 1 < count as the loop condition, which exits when there aren't enough vectors for both the current batch AND a next batch to prefetch. This means the last full batch (where there's no next batch to prefetch) was always processed one-by-one in the scalar tail. This PR changes the loop condition to c + batches - 1 < count (process all full batches), and guard the prefetch with const bool has_next = c + 2 * batches - 1 < count. This pattern was already used in vec_i4_2.cpp (AVX-512 int4) — now applied consistently everywhere. Also fixes > to >= in SIMD stride checks across all files, so that when dims equals exactly the stride length, we use the SIMD path instead of falling through to scalar.
10 tasks
mromaios
pushed a commit
to mromaios/elasticsearch
that referenced
this pull request
Apr 9, 2026
…scalar tail (elastic#145316) This PR fixes a small issue in bulk scoring functions where the last batch of vectors was unnecessarily dropped to the single-vector tail loop. Bulk loops used c + 2 * batches - 1 < count as the loop condition, which exits when there aren't enough vectors for both the current batch AND a next batch to prefetch. This means the last full batch (where there's no next batch to prefetch) was always processed one-by-one in the scalar tail. This PR changes the loop condition to c + batches - 1 < count (process all full batches), and guard the prefetch with const bool has_next = c + 2 * batches - 1 < count. This pattern was already used in vec_i4_2.cpp (AVX-512 int4) — now applied consistently everywhere. Also fixes > to >= in SIMD stride checks across all files, so that when dims equals exactly the stride length, we use the SIMD path instead of falling through to scalar.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a small issue in bulk scoring functions where the last batch of vectors was unnecessarily dropped to the single-vector tail loop.
Bulk loops used
c + 2 * batches - 1 < countas the loop condition, which exits when there aren't enough vectors for both the current batch AND a next batch to prefetch. This means the last full batch (where there's no next batch to prefetch) was always processed one-by-one in the scalar tail.This PR changes the loop condition to
c + batches - 1 < count(process all full batches), and guard the prefetch withconst bool has_next = c + 2 * batches - 1 < count. This pattern was already used invec_i4_2.cpp(AVX-512 int4) — now applied consistently everywhere.Also fixes
>to>=in SIMD stride checks across all files, so that whendimsequals exactly the stride length, we use the SIMD path instead of falling through to scalar.Relates to #145411
Test plan
JDKVectorLibrary*Testspass locally on Apple Silicon (aarch64)JDKVectorLibraryInt8Testspass on AMD c8a (x64 AVX-512)