Description
TermInSetQuery currently "ping-pong" intersects a sorted list against the term dictionary.
Instead of sorted-list, it could possibly use Daciuk Mihov Automaton, which can be built in linear time. Then query could leverage Terms.intersect (e.g. TermInSetQuery could be an AutomatonQuery subclass).
This should give faster intersection of the terms, which is usually the heavy part of this query. For example BlockTree terms dictionary has a very efficient Terms.intersect that makes use of the underlying structure.
The annoying part: DaciukMihovAutomatonBuilder currently requires unicode strings and makes a UTF-32 automaton, which would then be converted to UTF-8 (binary) automaton via UTF32ToUTF8. But I think TermInSetQuery may allow arbitrary non-unicode binary strings?
In order to support arbitrarily binary terms (and to avoid conversions), the DaciukMihov code would have to modified, to support construction of a binary automaton directly. Probably this is actually simpler?
This is just an idea to get more performance, it hasn't been tested. feel free to close the issue if it doesnt work out.