Skip to content

Stop minimizing regexps [LUCENE-10296] #11332

@asfimport

Description

@asfimport

In current trunk, we let caller (e.g. RegExpQuery) try to "reduce" the expression. The parser nor the low-level executors don't implicitly call exponential-time algorithms anymore.

But now that we have cleaned this up, we can see it is even worse than just calling determinize()}. We still call minimize() which is much crazier and much more. 

We stopped doing this for all other AutomatonQuery subclasses a long time ago, as we determined that it didn't help performance. Additionally, minimization vs. determinization is even less important than early days where we found trouble: the representation got a lot better. Today when you finishState we do a lot of practical sorting/coalescing on-the-fly. Also we added this fancy UTF32-to-UTF8 automata convertor, that makes the worst-case-space-per-state significantly lower than it was before? So why minimize() ?

Let's just replace minimize() calls with determinize() calls? I've already swapped them out for all of src/test}, to get jenkins looking for issues ahead of time.


Migrated from LUCENE-10296 by Robert Muir (@rmuir), resolved Dec 09 2021
Pull requests: #528

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions