Skip to content

MultiTermQuery return null for ScoreSupplier#13454

Merged
mayya-sharipova merged 5 commits intoapache:mainfrom
mayya-sharipova:multi-term-query-null-query-supplier
Jun 6, 2024
Merged

MultiTermQuery return null for ScoreSupplier#13454
mayya-sharipova merged 5 commits intoapache:mainfrom
mayya-sharipova:multi-term-query-null-query-supplier

Conversation

@mayya-sharipova
Copy link
Copy Markdown
Contributor

MultiTermQuery return null for ScoreSupplier if there are no terms in an index that
match query terms.

With the introduction of PR #12156 we saw degradation in performance of bool queries where one of the mandatory clauses is a TermInSetQuery with query terms not present in the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which would shortcut the whole bool query.

This PR adds ability for MultiTermQuery to return null for ScoreSupplier if a field doesn't contain any query terms.

Relates to PR #12156

MultiTermQuery return null for ScoreSupplier if there are no terms in an index that
 match query terms.

With the introduction of PR apache#12156 we saw degradation in performance of bool queries
where one of the mandatory clauses is a TermInSetQuery with query terms not present in
the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which
would shortcut the whole bool query.

This PR adds ability for MultiTermQuery to return null for ScoreSupplier if a field
doesn't contain any query terms.

Relates to PR apache#12156
@mayya-sharipova
Copy link
Copy Markdown
Contributor Author

mayya-sharipova commented Jun 5, 2024

@gsmiller @jpountz I don't know all the implications for this change, so your feedback is welcome.

Copy link
Copy Markdown
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change. Instead of pushing the full cost of rewriting the query to ScorerSupplier#get, you are collecting the first 16 terms when calling Weight#scorerSupplier and the rest (potentially thousands of terms) when calling ScorerSupplier#get. This in-turn allows making better decisions for only a small amount of work pushed to Weight#scorerSupplier.

I left some style comments but I like the approach.

@mayya-sharipova
Copy link
Copy Markdown
Contributor Author

@jpountz Thanks for your feedback, I addressed it, and the code looks much better with your suggestions!

Copy link
Copy Markdown
Contributor

@gsmiller gsmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable change to me as well. Thanks!

@mayya-sharipova mayya-sharipova merged commit 512ff4a into apache:main Jun 6, 2024
@mayya-sharipova mayya-sharipova deleted the multi-term-query-null-query-supplier branch June 6, 2024 18:44
mayya-sharipova added a commit that referenced this pull request Jun 6, 2024
MultiTermQuery return null for ScoreSupplier if there are no terms in an index that
 match query terms.

With the introduction of PR #12156 we saw degradation in performance of bool queries
where one of the mandatory clauses is a TermInSetQuery with query terms not present in
the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which
would shortcut the whole bool query.

This PR adds ability for MultiTermQuery to return null for ScoreSupplier if a field
doesn't contain any query terms.

Relates to PR #12156
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants