Skip to content

Should we have a NFA Query? [LUCENE-10010] #11049

@asfimport

Description

@asfimport

Today when a RegexpQuery is created, it will be translated to NFA, determinized to DFA and eventually become an AutomatonQuery, which is very fast. However, not every NFA could be determinized to DFA easily, the example given in #11020 showed how easy could a short regexp break the determinize process.

Maybe, instead of marking those kind of queries as adversarial cases, we could make a new kind of NFA query, which execute directly on NFA and thus no need to worry about determinize process or determinized DFA size. It should be slower, but also makes those adversarial cases doable.

This article has provided a simple but efficient way of searching over NFA, essentially it is a partial determinize process that only determinize the necessary part of DFA. Maybe we could give it a try?


Migrated from LUCENE-10010 by Patrick Zhai (@zhaih), updated Dec 21 2021
Pull requests: #225, #296

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions