Skip to content

Fix VectorScorerOSQBenchmarkTests data generation flakiness#142952

Merged
ldematte merged 3 commits intoelastic:mainfrom
ldematte:fix-vector-scorer-benchmark-data-generation
Feb 26, 2026
Merged

Fix VectorScorerOSQBenchmarkTests data generation flakiness#142952
ldematte merged 3 commits intoelastic:mainfrom
ldematte:fix-vector-scorer-benchmark-data-generation

Conversation

@ldematte
Copy link
Copy Markdown
Contributor

Summary

  • The scalar and vectorized benchmark tests generated input data independently using the same Random seed. SIMD reduction order changes across JIT compilation tiers (e.g., reduceLanes(ADD) switching from sequential to pairwise reduction) caused the quantized data to diverge slightly between the two setup() calls, leading to >10% score differences on certain seeds.
  • Extracted data generation into a static generateBenchmarkData() method returning a BenchmarkData record. Both test benchmarks now share the same record, ensuring identical input data.
  • Added an offset-accepting overload of writeBulkOSQVectorData to avoid array copies when writing slices of the flat index vector array.

Fixes #142881
Fixes #142883

Test plan

  • Verified the previously failing seeds now pass: E503FB0D7B878481 (p0=384 p1=4 NIO COSINE) and FDBBE54D76A7E1AB (p0=1024 p1=1 NIO DOT_PRODUCT)
  • Full VectorScorerOSQBenchmarkTests suite passes with --tests.iters=3
  • ESNextOSQVectorsScorerTests (existing caller of writeBulkOSQVectorData) still passes

Made with Cursor

The scalar and vectorized benchmarks generated their input data
independently using the same Random seed. SIMD reduction order
changes across JIT compilation tiers (e.g. reduceLanes(ADD)
switching from sequential to pairwise reduction) caused the
quantized data to diverge slightly between the two setup() calls,
leading to >10% score differences on certain seeds.

Extract data generation into a static generateBenchmarkData()
method that returns a BenchmarkData record; both test benchmarks
now share the same record so the input data is identical.

Fixes elastic#142881
Fixes elastic#142883

Co-authored-by: Cursor <cursoragent@cursor.com>
@elasticsearchmachine elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Feb 24, 2026
@ldematte ldematte added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search labels Feb 24, 2026
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Feb 24, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ldematte ldematte merged commit 20eea4c into elastic:main Feb 26, 2026
35 checks passed
@ldematte ldematte deleted the fix-vector-scorer-benchmark-data-generation branch February 26, 2026 16:18
PeteGillinElastic pushed a commit to PeteGillinElastic/elasticsearch that referenced this pull request Feb 27, 2026
…142952)

* Fix VectorScorerOSQBenchmarkTests data generation flakiness

The scalar and vectorized benchmarks generated their input data
independently using the same Random seed. SIMD reduction order
changes across JIT compilation tiers (e.g. reduceLanes(ADD)
switching from sequential to pairwise reduction) caused the
quantized data to diverge slightly between the two setup() calls,
leading to >10% score differences on certain seeds.

Extract data generation into a static generateBenchmarkData()
method that returns a BenchmarkData record; both test benchmarks
now share the same record so the input data is identical.

Fixes elastic#142881
Fixes elastic#142883

Co-authored-by: Cursor <cursoragent@cursor.com>

* Renaming

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

3 participants