Bump the window size of disjunctions from 2,048 to 4,096. by jpountz · Pull Request #13605 · apache/lucene

jpountz · 2024-07-24T07:35:25Z

It's been pointed multiple times that a difference between Tantivy and Lucene is the fact that Tantivy uses windows of 4,096 docs when Lucene has a 2x smaller window size of 2,048 docs and that this might explain part of the performance difference. luceneutil suggests that bumping the window size to 4,096 does indeed improve performance for counting queries, but not for top-k queries. I'm still suggesting to bump the window size across the board to keep our disjunction scorers consistent.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CountPhrase        3.27     (11.6%)        3.14      (8.0%)   -4.1% ( -21% -   17%) 0.189
               HighTermMonthSort     3521.28      (3.5%)     3481.74      (2.8%)   -1.1% (  -7% -    5%) 0.262
                        PKLookup      289.42      (1.3%)      286.47      (2.2%)   -1.0% (  -4% -    2%) 0.075
                      TermDTSort      352.01      (6.5%)      348.89      (5.6%)   -0.9% ( -12% -   11%) 0.642
                          Phrase       11.85      (5.3%)       11.76      (5.0%)   -0.8% ( -10% -    9%) 0.634
                       OrHighLow      772.82      (2.4%)      767.24      (2.1%)   -0.7% (  -5% -    3%) 0.313
                 CountAndHighMed      120.78      (2.3%)      120.10      (2.5%)   -0.6% (  -5% -    4%) 0.449
           HighTermDayOfYearSort      821.48      (3.5%)      818.62      (2.7%)   -0.3% (  -6% -    6%) 0.724
               HighTermTitleSort      148.84      (2.9%)      148.33      (2.8%)   -0.3% (  -5% -    5%) 0.700
                     AndHighHigh       62.36      (1.7%)       62.17      (1.8%)   -0.3% (  -3% -    3%) 0.584
                CountAndHighHigh       41.41      (2.5%)       41.34      (2.6%)   -0.2% (  -5% -    5%) 0.836
                          Fuzzy1       96.24      (1.0%)       96.09      (1.2%)   -0.2% (  -2% -    2%) 0.667
                      AndHighLow      827.59      (2.7%)      826.89      (2.4%)   -0.1% (  -5% -    5%) 0.918
                      AndHighMed       93.35      (1.6%)       93.29      (1.7%)   -0.1% (  -3% -    3%) 0.903
            HighTermTitleBDVSort       16.30      (4.2%)       16.29      (6.7%)   -0.0% ( -10% -   11%) 0.984
                       OrHighMed      153.42      (2.6%)      153.41      (2.2%)   -0.0% (  -4% -    4%) 0.994
                         Respell       46.72      (1.3%)       46.72      (1.4%)    0.0% (  -2% -    2%) 0.975
                       And3Terms      155.73      (2.2%)      155.95      (1.4%)    0.1% (  -3% -    3%) 0.805
                          Fuzzy2       58.66      (0.9%)       58.77      (1.1%)    0.2% (  -1% -    2%) 0.566
                      OrHighHigh       75.70      (2.6%)       75.90      (2.3%)    0.3% (  -4% -    5%) 0.733
                       CountTerm     9110.00      (4.3%)     9142.10      (3.2%)    0.4% (  -6% -    8%) 0.768
                    AndStopWords       29.47      (2.6%)       29.57      (1.3%)    0.4% (  -3% -    4%) 0.579
             And2Terms2StopWords      150.30      (2.1%)      150.86      (1.1%)    0.4% (  -2% -    3%) 0.487
                      OrHighRare      237.33      (5.7%)      238.26      (6.2%)    0.4% ( -10% -   13%) 0.837
                         MedTerm      553.55      (6.0%)      555.97      (7.7%)    0.4% ( -12% -   15%) 0.841
                        Wildcard       34.08      (3.2%)       34.25      (3.4%)    0.5% (  -5% -    7%) 0.630
                    OrNotHighLow      761.70      (3.2%)      766.33      (2.6%)    0.6% (  -5% -    6%) 0.511
              Or2Terms2StopWords      156.10      (3.2%)      157.14      (1.8%)    0.7% (  -4% -    5%) 0.416
                        Or3Terms      156.59      (3.0%)      157.70      (1.9%)    0.7% (  -4% -    5%) 0.374
                        HighTerm      440.27      (5.6%)      443.89      (7.5%)    0.8% ( -11% -   14%) 0.695
                         LowTerm      892.27      (5.2%)      900.48      (6.8%)    0.9% ( -10% -   13%) 0.632
                     OrStopWords       31.88      (4.7%)       32.29      (2.6%)    1.3% (  -5% -    9%) 0.276
                         Prefix3      214.22      (3.4%)      217.48      (2.8%)    1.5% (  -4% -    8%) 0.124
                   OrHighNotHigh      247.52      (4.8%)      254.52      (5.1%)    2.8% (  -6% -   13%) 0.071
                          IntNRQ      144.53     (17.2%)      148.66     (17.9%)    2.9% ( -27% -   45%) 0.607
                    OrNotHighMed      330.23      (6.5%)      340.12      (5.4%)    3.0% (  -8% -   15%) 0.114
                    OrHighNotMed      285.11      (5.2%)      293.82      (6.2%)    3.1% (  -7% -   15%) 0.092
                    OrHighNotLow      429.94      (5.4%)      443.15      (6.8%)    3.1% (  -8% -   16%) 0.113
                   OrNotHighHigh      189.30      (5.9%)      195.25      (5.4%)    3.1% (  -7% -   15%) 0.079
                  CountOrHighMed       99.90     (22.5%)      121.78     (20.0%)   21.9% ( -16% -   83%) 0.001
                 CountOrHighHigh       53.76     (35.1%)       70.24     (32.5%)   30.6% ( -27% -  151%) 0.004

Description

It's been pointed multiple times that a difference between Tantivy and Lucene is the fact that Tantivy uses windows of 4,096 docs when Lucene has a 2x smaller window size of 2,048 docs and that this might explain part of the performance difference. luceneutil suggests that bumping the window size to 4,096 does indeed improve performance for counting queries, but not for top-k queries. I'm still suggesting to bump the window size across the board to keep our disjunction scorer consistent. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CountPhrase 3.27 (11.6%) 3.14 (8.0%) -4.1% ( -21% - 17%) 0.189 HighTermMonthSort 3521.28 (3.5%) 3481.74 (2.8%) -1.1% ( -7% - 5%) 0.262 PKLookup 289.42 (1.3%) 286.47 (2.2%) -1.0% ( -4% - 2%) 0.075 TermDTSort 352.01 (6.5%) 348.89 (5.6%) -0.9% ( -12% - 11%) 0.642 Phrase 11.85 (5.3%) 11.76 (5.0%) -0.8% ( -10% - 9%) 0.634 OrHighLow 772.82 (2.4%) 767.24 (2.1%) -0.7% ( -5% - 3%) 0.313 CountAndHighMed 120.78 (2.3%) 120.10 (2.5%) -0.6% ( -5% - 4%) 0.449 HighTermDayOfYearSort 821.48 (3.5%) 818.62 (2.7%) -0.3% ( -6% - 6%) 0.724 HighTermTitleSort 148.84 (2.9%) 148.33 (2.8%) -0.3% ( -5% - 5%) 0.700 AndHighHigh 62.36 (1.7%) 62.17 (1.8%) -0.3% ( -3% - 3%) 0.584 CountAndHighHigh 41.41 (2.5%) 41.34 (2.6%) -0.2% ( -5% - 5%) 0.836 Fuzzy1 96.24 (1.0%) 96.09 (1.2%) -0.2% ( -2% - 2%) 0.667 AndHighLow 827.59 (2.7%) 826.89 (2.4%) -0.1% ( -5% - 5%) 0.918 AndHighMed 93.35 (1.6%) 93.29 (1.7%) -0.1% ( -3% - 3%) 0.903 HighTermTitleBDVSort 16.30 (4.2%) 16.29 (6.7%) -0.0% ( -10% - 11%) 0.984 OrHighMed 153.42 (2.6%) 153.41 (2.2%) -0.0% ( -4% - 4%) 0.994 Respell 46.72 (1.3%) 46.72 (1.4%) 0.0% ( -2% - 2%) 0.975 And3Terms 155.73 (2.2%) 155.95 (1.4%) 0.1% ( -3% - 3%) 0.805 Fuzzy2 58.66 (0.9%) 58.77 (1.1%) 0.2% ( -1% - 2%) 0.566 OrHighHigh 75.70 (2.6%) 75.90 (2.3%) 0.3% ( -4% - 5%) 0.733 CountTerm 9110.00 (4.3%) 9142.10 (3.2%) 0.4% ( -6% - 8%) 0.768 AndStopWords 29.47 (2.6%) 29.57 (1.3%) 0.4% ( -3% - 4%) 0.579 And2Terms2StopWords 150.30 (2.1%) 150.86 (1.1%) 0.4% ( -2% - 3%) 0.487 OrHighRare 237.33 (5.7%) 238.26 (6.2%) 0.4% ( -10% - 13%) 0.837 MedTerm 553.55 (6.0%) 555.97 (7.7%) 0.4% ( -12% - 15%) 0.841 Wildcard 34.08 (3.2%) 34.25 (3.4%) 0.5% ( -5% - 7%) 0.630 OrNotHighLow 761.70 (3.2%) 766.33 (2.6%) 0.6% ( -5% - 6%) 0.511 Or2Terms2StopWords 156.10 (3.2%) 157.14 (1.8%) 0.7% ( -4% - 5%) 0.416 Or3Terms 156.59 (3.0%) 157.70 (1.9%) 0.7% ( -4% - 5%) 0.374 HighTerm 440.27 (5.6%) 443.89 (7.5%) 0.8% ( -11% - 14%) 0.695 LowTerm 892.27 (5.2%) 900.48 (6.8%) 0.9% ( -10% - 13%) 0.632 OrStopWords 31.88 (4.7%) 32.29 (2.6%) 1.3% ( -5% - 9%) 0.276 Prefix3 214.22 (3.4%) 217.48 (2.8%) 1.5% ( -4% - 8%) 0.124 OrHighNotHigh 247.52 (4.8%) 254.52 (5.1%) 2.8% ( -6% - 13%) 0.071 IntNRQ 144.53 (17.2%) 148.66 (17.9%) 2.9% ( -27% - 45%) 0.607 OrNotHighMed 330.23 (6.5%) 340.12 (5.4%) 3.0% ( -8% - 15%) 0.114 OrHighNotMed 285.11 (5.2%) 293.82 (6.2%) 3.1% ( -7% - 15%) 0.092 OrHighNotLow 429.94 (5.4%) 443.15 (6.8%) 3.1% ( -8% - 16%) 0.113 OrNotHighHigh 189.30 (5.9%) 195.25 (5.4%) 3.1% ( -7% - 15%) 0.079 CountOrHighMed 99.90 (22.5%) 121.78 (20.0%) 21.9% ( -16% - 83%) 0.001 CountOrHighHigh 53.76 (35.1%) 70.24 (32.5%) 30.6% ( -27% - 151%) 0.004 ```

It's been pointed multiple times that a difference between Tantivy and Lucene is the fact that Tantivy uses windows of 4,096 docs when Lucene has a 2x smaller window size of 2,048 docs and that this might explain part of the performance difference. luceneutil suggests that bumping the window size to 4,096 does indeed improve performance for counting queries, but not for top-k queries. I'm still suggesting to bump the window size across the board to keep our disjunction scorer consistent.

jpountz added this to the 9.12.0 milestone Jul 24, 2024

jpountz changed the title ~~Bump the window size of disjunction from 2,048 to 4,096.~~ Bump the window size of disjunctions from 2,048 to 4,096. Jul 24, 2024

jpountz merged commit 8d4f7a6 into apache:main Jul 25, 2024

jpountz deleted the bump_disjunction_window_size branch July 25, 2024 13:38

expani mentioned this pull request Apr 7, 2025

Increase window interval for CancellableBulkScorer inline with BooleanScorer opensearch-project/OpenSearch#17824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump the window size of disjunctions from 2,048 to 4,096.#13605

Bump the window size of disjunctions from 2,048 to 4,096.#13605
jpountz merged 1 commit intoapache:mainfrom
jpountz:bump_disjunction_window_size

jpountz commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpountz commented Jul 24, 2024

Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant