-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the expression. The parser nor the low-level executors don't implicitly call exponential-time algorithms anymore.
But now that we have cleaned this up, we can see it is even worse than just calling determinize()}. We still call minimize() which is much crazier and much more.
We stopped doing this for all other AutomatonQuery subclasses a long time ago, as we determined that it didn't help performance. Additionally, minimization vs. determinization is even less important than early days where we found trouble: the representation got a lot better. Today when you finishState we do a lot of practical sorting/coalescing on-the-fly. Also we added this fancy UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state significantly lower than it was before? So why minimize() ?
Let's just replace minimize() calls with determinize() calls? I've already swapped them out for all of src/test}, to get jenkins looking for issues ahead of time.
Migrated from LUCENE-10296 by Robert Muir (@rmuir), resolved Dec 09 2021
Pull requests: #528