Fix VectorScorerOSQBenchmarkTests data generation flakiness by ldematte · Pull Request #142952 · elastic/elasticsearch

ldematte · 2026-02-24T14:11:56Z

Summary

The scalar and vectorized benchmark tests generated input data independently using the same Random seed. SIMD reduction order changes across JIT compilation tiers (e.g., reduceLanes(ADD) switching from sequential to pairwise reduction) caused the quantized data to diverge slightly between the two setup() calls, leading to >10% score differences on certain seeds.
Extracted data generation into a static generateBenchmarkData() method returning a BenchmarkData record. Both test benchmarks now share the same record, ensuring identical input data.
Added an offset-accepting overload of writeBulkOSQVectorData to avoid array copies when writing slices of the flat index vector array.

Test plan

Verified the previously failing seeds now pass: E503FB0D7B878481 (p0=384 p1=4 NIO COSINE) and FDBBE54D76A7E1AB (p0=1024 p1=1 NIO DOT_PRODUCT)
Full VectorScorerOSQBenchmarkTests suite passes with --tests.iters=3
ESNextOSQVectorsScorerTests (existing caller of writeBulkOSQVectorData) still passes

Made with Cursor

The scalar and vectorized benchmarks generated their input data independently using the same Random seed. SIMD reduction order changes across JIT compilation tiers (e.g. reduceLanes(ADD) switching from sequential to pairwise reduction) caused the quantized data to diverge slightly between the two setup() calls, leading to >10% score differences on certain seeds. Extract data generation into a static generateBenchmarkData() method that returns a BenchmarkData record; both test benchmarks now share the same record so the input data is identical. Fixes elastic#142881 Fixes elastic#142883 Co-authored-by: Cursor <cursoragent@cursor.com>

elasticsearchmachine · 2026-02-24T14:15:11Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

...hmarks/src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerOSQBenchmark.java

…enchmark-data-generation

…142952) * Fix VectorScorerOSQBenchmarkTests data generation flakiness The scalar and vectorized benchmarks generated their input data independently using the same Random seed. SIMD reduction order changes across JIT compilation tiers (e.g. reduceLanes(ADD) switching from sequential to pairwise reduction) caused the quantized data to diverge slightly between the two setup() calls, leading to >10% score differences on certain seeds. Extract data generation into a static generateBenchmarkData() method that returns a BenchmarkData record; both test benchmarks now share the same record so the input data is identical. Fixes elastic#142881 Fixes elastic#142883 Co-authored-by: Cursor <cursoragent@cursor.com> * Renaming --------- Co-authored-by: Cursor <cursoragent@cursor.com>

elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Feb 24, 2026

ldematte added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search labels Feb 24, 2026

elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Feb 24, 2026

thecoop reviewed Feb 24, 2026

View reviewed changes

...hmarks/src/main/java/org/elasticsearch/benchmark/vector/scorer/VectorScorerOSQBenchmark.java Outdated Show resolved Hide resolved

ldematte added 2 commits February 24, 2026 18:44

Renaming

18d70be

Merge remote-tracking branch 'upstream/main' into fix-vector-scorer-b…

05497df

…enchmark-data-generation

thecoop approved these changes Feb 26, 2026

View reviewed changes

ldematte merged commit 20eea4c into elastic:main Feb 26, 2026
35 checks passed

ldematte deleted the fix-vector-scorer-benchmark-data-generation branch February 26, 2026 16:18

prwhelan mentioned this pull request Feb 27, 2026

[Transform] Clean up internal tests #143246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VectorScorerOSQBenchmarkTests data generation flakiness#142952

Fix VectorScorerOSQBenchmarkTests data generation flakiness#142952
ldematte merged 3 commits intoelastic:mainfrom
ldematte:fix-vector-scorer-benchmark-data-generation

ldematte commented Feb 24, 2026

Uh oh!

elasticsearchmachine commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ldematte commented Feb 24, 2026

Summary

Test plan

Uh oh!

elasticsearchmachine commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants