Skip to content

Reduce overhead of disabling scoring on BooleanScorer.#12475

Merged
jpountz merged 1 commit intoapache:mainfrom
jpountz:reduced_no_scoring_overhead_booleanscorer
Aug 3, 2023
Merged

Reduce overhead of disabling scoring on BooleanScorer.#12475
jpountz merged 1 commit intoapache:mainfrom
jpountz:reduced_no_scoring_overhead_booleanscorer

Conversation

@jpountz
Copy link
Copy Markdown
Contributor

@jpountz jpountz commented Jul 31, 2023

This is a subset of #12415, which I'm extracting to its own pull request in order to have separate data points in nightly benchmarks.

Results on wikimedium10m and wikinightly counting tasks:

                       CountTerm     4624.91      (6.4%)     4581.34      (6.4%)   -0.9% ( -12% -   12%) 0.640
                 CountAndHighMed      280.03      (4.5%)      280.15      (4.4%)    0.0% (  -8% -    9%) 0.974
                     CountPhrase        7.22      (3.0%)        7.24      (1.8%)    0.3% (  -4% -    5%) 0.728
                CountAndHighHigh       52.84      (4.9%)       53.12      (5.6%)    0.5% (  -9% -   11%) 0.755
                        PKLookup      232.01      (3.6%)      235.45      (2.8%)    1.5% (  -4% -    8%) 0.144
                 CountOrHighHigh       42.37      (6.1%)       56.04      (9.1%)   32.3% (  16% -   50%) 0.000
                  CountOrHighMed       30.56      (6.5%)       40.46      (9.8%)   32.4% (  15% -   52%) 0.000

This is a subset of apache#12415, which I'm extracting to its own pull request in
order to have separate data points in nightly benchmarks.

Results on `wikimedium10m` and `wikinightly` counting tasks:

```
                       CountTerm     4624.91      (6.4%)     4581.34      (6.4%)   -0.9% ( -12% -   12%) 0.640
                 CountAndHighMed      280.03      (4.5%)      280.15      (4.4%)    0.0% (  -8% -    9%) 0.974
                     CountPhrase        7.22      (3.0%)        7.24      (1.8%)    0.3% (  -4% -    5%) 0.728
                CountAndHighHigh       52.84      (4.9%)       53.12      (5.6%)    0.5% (  -9% -   11%) 0.755
                        PKLookup      232.01      (3.6%)      235.45      (2.8%)    1.5% (  -4% -    8%) 0.144
                 CountOrHighHigh       42.37      (6.1%)       56.04      (9.1%)   32.3% (  16% -   50%) 0.000
                  CountOrHighMed       30.56      (6.5%)       40.46      (9.8%)   32.4% (  15% -   52%) 0.000
```
@jpountz jpountz added this to the 9.8.0 milestone Jul 31, 2023
@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Jul 31, 2023

The failure is suspicious, I'll look into it.

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Aug 1, 2023

It is an unrelated but real bug. BooleanScorer sometimes forwards to an inner bulk scorer directly when a single one matches on a range. This may cause the collector's competitive iterator to be advanced to a document that is outside of the scored range (which feels like it is the root cause of the issue) and greater than a match of another clause of the disjunction.

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Aug 1, 2023

Opened #12481.

@jpountz jpountz merged commit acffcfa into apache:main Aug 3, 2023
@jpountz jpountz deleted the reduced_no_scoring_overhead_booleanscorer branch August 3, 2023 05:17
jpountz added a commit that referenced this pull request Aug 3, 2023
This is a subset of #12415, which I'm extracting to its own pull request in
order to have separate data points in nightly benchmarks.

Results on `wikimedium10m` and `wikinightly` counting tasks:

```
                       CountTerm     4624.91      (6.4%)     4581.34      (6.4%)   -0.9% ( -12% -   12%) 0.640
                 CountAndHighMed      280.03      (4.5%)      280.15      (4.4%)    0.0% (  -8% -    9%) 0.974
                     CountPhrase        7.22      (3.0%)        7.24      (1.8%)    0.3% (  -4% -    5%) 0.728
                CountAndHighHigh       52.84      (4.9%)       53.12      (5.6%)    0.5% (  -9% -   11%) 0.755
                        PKLookup      232.01      (3.6%)      235.45      (2.8%)    1.5% (  -4% -    8%) 0.144
                 CountOrHighHigh       42.37      (6.1%)       56.04      (9.1%)   32.3% (  16% -   50%) 0.000
                  CountOrHighMed       30.56      (6.5%)       40.46      (9.8%)   32.4% (  15% -   52%) 0.000
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant