Optimize grouping for segment concurrent search by ensuring that documents within each group are as equal as possible by kkewwei · Pull Request #18451 · opensearch-project/OpenSearch

kkewwei · 2025-06-05T17:48:10Z

Description

Druing segment concurrent search, we adopt a round-robin approach to distribute slices. However, this may lead to a significant imbalance in the number of documents across groups. For instance, consider a scenario with segment document counts as follows: 10, 8, 7, 6, 5, 4 and a slice size of 4.

Current grouping results (with a max-min document difference of 9):
group0: 10, 5
group1: 8, 4
group2: 7
group3: 6

Optimized grouping results (reducing the max-min difference to 3):
group0: 10
group1: 8
group2: 7,4
group3: 6,5

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2025-06-05T18:35:30Z

❌ Gradle check result for 6967a61: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…ments within each group are as equal as possible Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>

github-actions · 2025-06-06T01:29:11Z

❌ Gradle check result for 12b64ba: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-07-28T18:53:30Z

❌ Gradle check result for 17a02f6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-07-28T20:20:56Z

❕ Gradle check result for 17a02f6: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2025-07-29T09:25:17Z

✅ Gradle check result for c36c8ca: SUCCESS

kkewwei · 2025-07-29T10:00:57Z

@asimmahmood1 @expani It's ok now, please help merge in your spare time.

expani · 2025-07-30T18:49:04Z

@jainankitk please review this as we need another maintainer to merge it.

jainankitk

Thanks @kkewwei for making this change. I was initially wondering if greedy approach gives the most optimal solution for this problem, but after running few cases/permutations in my head, I am indeed convinced. Should also have a simple proof using contradiction. Having the leaves reverse sorted is important!!

jainankitk · 2025-07-31T21:39:29Z

While this holds true for lots of workloads, there can be cases where it can be sub-optimal which IMO is difficult to prove via benchmarks as it can keep getting skewed with data distribution for one case to another.

That's a fair point, but even the existing implementation doesn't necessarily account for this.

A more optimal way could be work stealing via a shared queue amongst all Search Threads but it would require changes in Lucene's IndexSearcher ( more discussion on this at #18338 (comment) ) or in ContextIndexSearcher at OpenSearch.

This is probably the right approach IMO, to assign the work lazily instead of eagerly. That should account for all the scenarios including the ones where few segments might have many more matches compared to other segments. The objective is to balance the amount of actual work done across different threads and not the number of indexed documents processed by each thread

… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: sunqijun.jun <sunqijun.jun@bytedance.com>

… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>

github-actions · 2025-08-19T18:42:43Z

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

github-actions · 2025-08-19T18:42:45Z

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>

kkewwei requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, VachaShah, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners June 5, 2025 17:48

kkewwei force-pushed the optimize_group branch from 20ac1f3 to 6967a61 Compare June 5, 2025 17:49

kkewwei marked this pull request as draft June 6, 2025 00:45

Optimize grouping for segment concurrent search by ensuring that docu…

12b64ba

…ments within each group are as equal as possible Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>

kkewwei force-pushed the optimize_group branch from d396023 to 12b64ba Compare June 6, 2025 01:20

kkewwei closed this Jun 6, 2025

kkewwei reopened this Jun 6, 2025

kkewwei marked this pull request as ready for review June 6, 2025 02:54

asimmahmood1 closed this Jul 28, 2025

asimmahmood1 reopened this Jul 28, 2025

asimmahmood1 closed this Jul 28, 2025

asimmahmood1 reopened this Jul 28, 2025

expani mentioned this pull request Jul 28, 2025

[Intra-SegmentConcurrentSearch] Slicing mechanism #18851

Open

This was referenced Jul 28, 2025

[AUTOCUT] Gradle Check Flaky Test Report for ResourceAwareTasksTests #14293

Closed

[AUTOCUT] Gradle Check Flaky Test Report for IndexStatsIT #15836

Open

Merge branch 'main' into optimize_group

c36c8ca

jainankitk approved these changes Jul 31, 2025

View reviewed changes

jainankitk merged commit b53b446 into opensearch-project:main Jul 31, 2025
31 checks passed

opensearch-ci-bot mentioned this pull request Aug 1, 2025

[AUTOCUT] Gradle Check Flaky Test Report for TransferManagerRemoteDirectoryReaderTests #16676

Closed

expani mentioned this pull request Aug 5, 2025

Enabling Intra Segment Concurrent Search for only queries without aggregations #18879

Closed

asimmahmood1 added v3.2.0 Performance This is for any performance related enhancements or bugs Search:Performance labels Aug 19, 2025

BrewTestBot mentioned this pull request Aug 20, 2025

opensearch 3.2.0 Homebrew/homebrew-core#234146

Merged

opensearch-ci-bot mentioned this pull request Aug 23, 2025

[AUTOCUT] Gradle Check Flaky Test Report for IngestFromKafkaIT #17215

Open

This was referenced Jan 10, 2026

docs: add segment concurrent search optimization report for v3.2.0 tkykenmt/opensearch-feature-explorer#1200

Merged

[feature] Segment Concurrent Search Optimization tkykenmt/opensearch-feature-explorer#1131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize grouping for segment concurrent search by ensuring that documents within each group are as equal as possible#18451

Optimize grouping for segment concurrent search by ensuring that documents within each group are as equal as possible#18451
jainankitk merged 5 commits intoopensearch-project:mainfrom
kkewwei:optimize_group

kkewwei commented Jun 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

kkewwei commented Jul 29, 2025

Uh oh!

expani commented Jul 30, 2025

Uh oh!

jainankitk left a comment

Uh oh!

jainankitk commented Jul 31, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

kkewwei commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

kkewwei commented Jul 29, 2025

Uh oh!

expani commented Jul 30, 2025

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

jainankitk commented Jul 31, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kkewwei commented Jun 5, 2025 •

edited

Loading