Expose iterator over query terms in TermInSetQuery#12280
Expose iterator over query terms in TermInSetQuery#12280gsmiller wants to merge 4 commits intoapache:mainfrom
Conversation
|
Please lets not go this path. It is a mutlitermquery, if you want to change how it works behind the scenes, you "plugin" with RewriteMethod |
|
Thanks @rmuir. It would be ideal if we could do this through RewriteMethod, but I'm not sure how we can actually accomplish that. The problem is in the implementation of Update: I added a sandbox query to this PR just to demonstrate the use-case for extending |
| * TermsEnum#seekCeil(BytesRef)} to produce a terms iterator, which is compatible with {@code | ||
| * BloomFilteringPostingsFormat}. | ||
| */ | ||
| public class PKTermInSetQuery extends TermInSetQuery { |
There was a problem hiding this comment.
This class is for demo purposes only. I'm not suggesting we merge it as part of this PR. I only want to demonstrate how a class might leverage getQueryTerms.
|
I still think it doesn't make sense to me to expose this. As i said on the dev list, your problem is that you use a custom postings format and you want it to accelerate the intersection. The cleanest way to do this, is to handoff the intersection to the postingsformat directly, rather than worry about seekCeil/seekExact and subclassing queries or exposing stuff. It should give a performance improvement using the default postings format as well (at least it did for other queries when mikemccand added it) So, IMO we should try to fix this query to use Terms.intersect() [see #12176], then override Terms.intersect for the BloomPostingsFormat to make use of the bloom filters to speed up intersection. |
|
Got it, thanks @rmuir. I hadn't seen your dev list reply yet. This all makes sense. I'll close this out and have a look at leveraging intersect. Seems like a better path forward. Thanks! |
Description
I'd like to propose we add an API to
TermInSetQuerythat exposes an iterator over the query terms. This is useful for extendingTermInSetQuery. One concrete use-case for this is needing to change the way term intersection happens with the indexed terms dictionary to support bloom filters, as described in this email thread.I don't think there's any harm in exposing this, but am interested in feedback of course! This abstraction decouples the current prefix-coding implementation details, so it seems clean.