Skip to content

Stop aligning windows in BooleanScorer.#12488

Merged
jpountz merged 4 commits intoapache:mainfrom
jpountz:boolean_scorer_not_align_windows
Aug 5, 2023
Merged

Stop aligning windows in BooleanScorer.#12488
jpountz merged 4 commits intoapache:mainfrom
jpountz:boolean_scorer_not_align_windows

Conversation

@jpountz
Copy link
Copy Markdown
Contributor

@jpountz jpountz commented Aug 4, 2023

BooleanScorer aligns windows to multiples of 2048, but it doesn't have to. Actually, not aligning windows can help evaluate fewer windows overall and speed up query evaluation.

This change speeds up counting title OR 12 on wikimedium10m by ~18%.

BooleanScorer aligns windows to multiples of 2048, but it doesn't have to.
Actually, not aligning windows can help evaluate fewer windows overall and
speed up query evaluation.

This change speeds up counting `title OR 12` on wikimedium10m by ~18%.
@mikemccand
Copy link
Copy Markdown
Member

Egads! That's an amazing gain!

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Aug 5, 2023

EDIT: this benchmark was not correctly run, see next comment.

Counting tasks confirm the speedup:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                CountAndHighHigh       52.00      (4.3%)       51.84      (3.5%)   -0.3% (  -7% -    7%) 0.809
                 CountAndHighMed      197.84      (3.6%)      197.39      (3.4%)   -0.2% (  -6% -    7%) 0.839
                        PKLookup      242.74      (2.9%)      243.13      (3.1%)    0.2% (  -5% -    6%) 0.867
                     CountPhrase       12.55      (3.2%)       12.58      (3.8%)    0.2% (  -6% -    7%) 0.852
                       CountTerm     9032.87      (3.1%)     9142.12      (4.1%)    1.2% (  -5% -    8%) 0.292
                  CountOrHighMed       73.85     (12.7%)       81.64      (3.5%)   10.5% (  -4% -   30%) 0.000
                 CountOrHighHigh       46.92     (13.7%)       52.06      (3.9%)   11.0% (  -5% -   33%) 0.001

@jpountz jpountz merged commit 09e3b43 into apache:main Aug 5, 2023
@jpountz jpountz deleted the boolean_scorer_not_align_windows branch August 5, 2023 09:29
jpountz added a commit that referenced this pull request Aug 5, 2023
BooleanScorer aligns windows to multiples of 2048, but it doesn't have to.
Actually, not aligning windows can help evaluate fewer windows overall and
speed up query evaluation.
@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Aug 6, 2023

I realized I made a mistake in the benchmark, my baseline was a couple changes behind and probably missed #12475. I reran the benchmark correctly, and there is actually a small slowdown:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CountOrHighHigh       50.49     (11.0%)       48.07     (12.0%)   -4.8% ( -25% -   20%) 0.188
                  CountOrHighMed       79.22     (10.3%)       75.80     (11.4%)   -4.3% ( -23% -   19%) 0.210
                 CountAndHighMed      195.82      (3.4%)      194.63      (4.3%)   -0.6% (  -8% -    7%) 0.622
                CountAndHighHigh       51.30      (3.8%)       51.16      (4.9%)   -0.3% (  -8% -    8%) 0.851
                     CountPhrase       12.54      (2.5%)       12.54      (3.6%)   -0.0% (  -6% -    6%) 0.996
                       CountTerm     9052.13      (3.3%)     9086.40      (3.3%)    0.4% (  -6% -    7%) 0.719
                        PKLookup      241.61      (3.3%)      243.34      (2.6%)    0.7% (  -5% -    6%) 0.449

I will revert.

jpountz added a commit that referenced this pull request Aug 6, 2023
jpountz added a commit that referenced this pull request Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants