Optimize grouping for segment concurrent search by ensuring that documents within each group are as equal as possible#18451
Conversation
|
❌ Gradle check result for 6967a61: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ments within each group are as equal as possible Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>
|
❌ Gradle check result for 12b64ba: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 17a02f6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❕ Gradle check result for 17a02f6: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
|
@asimmahmood1 @expani It's ok now, please help merge in your spare time. |
|
@jainankitk please review this as we need another maintainer to merge it. |
jainankitk
left a comment
There was a problem hiding this comment.
Thanks @kkewwei for making this change. I was initially wondering if greedy approach gives the most optimal solution for this problem, but after running few cases/permutations in my head, I am indeed convinced. Should also have a simple proof using contradiction. Having the leaves reverse sorted is important!!
That's a fair point, but even the existing implementation doesn't necessarily account for this.
This is probably the right approach IMO, to assign the work lazily instead of eagerly. That should account for all the scenarios including the ones where few segments might have many more matches compared to other segments. The objective is to |
… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: sunqijun.jun <sunqijun.jun@bytedance.com>
… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>
|
Hello! |
1 similar comment
|
Hello! |
… document distribution (opensearch-project#18451) Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com>
Description
Druing segment concurrent search, we adopt a round-robin approach to distribute slices. However, this may lead to a significant imbalance in the number of documents across groups. For instance, consider a scenario with segment document counts as follows: 10, 8, 7, 6, 5, 4 and a slice size of 4.
Current grouping results (with a max-min document difference of 9):
group0: 10, 5
group1: 8, 4
group2: 7
group3: 6
Optimized grouping results (reducing the max-min difference to 3):
group0: 10
group1: 8
group2: 7,4
group3: 6,5
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.