Increase window interval for CancellableBulkScorer inline with BooleanScorer#17824
Increase window interval for CancellableBulkScorer inline with BooleanScorer#17824jainankitk merged 3 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: expani <anijainc@amazon.com>
|
{"run-benchmark-test": "id_4"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2798/ . Final results will be published once the job is completed. |
|
❕ Gradle check result for 84528f1: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17824 +/- ##
============================================
- Coverage 72.55% 72.37% -0.19%
+ Complexity 66502 66443 -59
============================================
Files 5399 5408 +9
Lines 307755 308080 +325
Branches 44656 44720 +64
============================================
- Hits 223295 222974 -321
- Misses 66266 66877 +611
- Partials 18194 18229 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/2798/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/55/
|
|
I spent some time on crafting a query that can see a significant improvement from this change. Since, it's related to BooleanScrorer's window size, I created the query below to ensure Even this query didn't show any significant improvement/regression with 4096 as the interval size. I reused the same index and kept rebuilding OS Jar with the changed constant 2048 v/s 4096 to remove other variance factors. The reason for no change is only the starting window intervals are different and they quickly ( 9 for 2048 v/s 8 iterations for 4098 ) reach the terminal limit of 2^20 ( 1,048,576 ) after which the interval size is not increased by CancellableBulkScorer. To accurately show the improvement from this change, the score method should do some heavy operation ( per call ) other than moving the DISI and calling collector.collect ( the count of which is same in both cases ) AND terminate before the interval size reaches its terminal limit of 2^20 Given that we are reaching the cutoff for beta 3.0, should we spend more time on figuring out the exact query OR merge the change as there is no regression in existing Big5 and this constant is changed in Lucene ? @msfroh @jainankitk Let me know your thoughts on the same. |
I would go ahead and merge it. Updating the constant to be consistent with Lucene matches how the constant was defined in the first place. I can certainly see how the impact would be small, but IMO it doesn't hurt. |
I agree with @msfroh on this. Also, even the Lucene micro-benchmark doesn't show much improvement / regression either way. So, I would just go ahead and keep it consistent. |
|
It also makes debugging in future better, someone might think there's a performance regression due the constant being different and spend time trying to validate it. |
…17824) Signed-off-by: expani <anijainc@amazon.com> Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>
…17824) Signed-off-by: expani <anijainc@amazon.com> Signed-off-by: Harsh Kothari <techarsh@amazon.com>
…17824) Signed-off-by: expani <anijainc@amazon.com> Signed-off-by: Harsh Kothari <techarsh@amazon.com>
Description
Increasing the interval size used by CancellableBulkScorer to be the same as BooleanScorer which was upgraded in Lucene by apache/lucene#13605
We can decide on making this change after some benchmark numbers to ensure no query types regress.
Related Issues
Resolves #17456