Skip to content

Optimization in String Terms Aggregation query for Large Bucket Counts#18732

Merged
rishabhmaurya merged 15 commits intoopensearch-project:mainfrom
vinaykpud:string-term-agg-opt
Sep 30, 2025
Merged

Optimization in String Terms Aggregation query for Large Bucket Counts#18732
rishabhmaurya merged 15 commits intoopensearch-project:mainfrom
vinaykpud:string-term-agg-opt

Conversation

@vinaykpud
Copy link
Copy Markdown
Contributor

@vinaykpud vinaykpud commented Jul 11, 2025

Description

If the number of requested top-N buckets exceeds or close to the maximum bucket ordinal, making the use of a PriorityQueue for top-N selection inefficient or redundant. So we made following modifications:

  1. use quickselect for topN if the requested size is greater than the 20% of the total buckets.
  2. If the requested size is greater than the bucket size then return all the bucket.

Benchmark test results here :

#18704 (comment)

Related Issues

Resolves #18704
Related #18650

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Search:Performance labels Jul 11, 2025
@vinaykpud vinaykpud force-pushed the string-term-agg-opt branch 4 times, most recently from fa96268 to 0cf5b78 Compare July 11, 2025 18:22
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b25271f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 242faae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud force-pushed the string-term-agg-opt branch from 242faae to 482a37e Compare July 14, 2025 22:34
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 482a37e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud force-pushed the string-term-agg-opt branch from 482a37e to 81211c1 Compare July 14, 2025 23:46
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 81211c1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for a81608e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Jul 15, 2025
@vinaykpud vinaykpud reopened this Jul 15, 2025
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for a81608e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b455fc7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for b6d4f94: SUCCESS

@rishabhmaurya
Copy link
Copy Markdown
Contributor

rishabhmaurya commented Sep 30, 2025

I'm guessing we are reusing the same logic we used for numeric terms #18702 i.e.

when size >= bucketsInOrd : return all
bucketsInOrd> size && bucketsInOrd <= 5*size : quick select
else i.e. bucketsInOrd> size &&   bucketsInOrd > 5*size : pq

LGTM

@vinaykpud
Copy link
Copy Markdown
Contributor Author

Correct, decision logic is same.

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e3abf3e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Sep 30, 2025
@vinaykpud vinaykpud reopened this Sep 30, 2025
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for f42714f: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 33a1899: SUCCESS

@rishabhmaurya rishabhmaurya merged commit 2817029 into opensearch-project:main Sep 30, 2025
33 checks passed
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Oct 15, 2025
opensearch-project#18732)

* Optimize String terms agg

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Updated the algorithm selection logic and cleanup

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated the algorithm selection logic and cleanup

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated bucket sorting at shard level for keyorder

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* fixed bug in the condition

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated logic in topN selection depending on request size

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* use priority queue method for significant terms

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Added some comments

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* updated partiallyBuiltBucketComparator null check logic

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Added tests and updated GlobalOrdinalsStringTermsAggregator

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Fixed spotless checks

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Fixed issues in changelog

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search:Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Optimize String terms agg

3 participants