LUCENE-10010 Introduce NFARunAutomaton to run NFA directly by zhaih · Pull Request #225 · apache/lucene

zhaih · 2021-07-26T21:29:01Z

Description

https://issues.apache.org/jira/browse/LUCENE-10010

Introduces NFARunAutomaton to run NFA directly

Works to to:

Integrate with current RunAutomaton class hierarchy
Further optimize the NFARunAutomaton implementation

Tests

A unit test that assert the NFARunAutomaton behaves the same as the DFA one by using random generated regex strings

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.

mikemccand

This is super exciting! I'm amazed how little code you needed to get this first version running.

I left some comments about how we might share some code with RunAutomaton instead of copying/forking those methods, but I don't think that's a blocker issue.

I love the randomized test - make a random RegExp, find random strings it accepts and confirm the NFARunAutomaton accepts them too. And then generate random (mostly rejected) strings and confirm those are rejected too. And change the order of those two, to exercise the lazy caching.

Another way to make random NFA would be to 1) enumerate random strings, 2) make DFA from those strings (using the efficient builder API Lucene provides), and 3) simply duplicating some states in that DFA creating an NFA, and then 4) confirm that the resulting NFA only accepts that same subset of strings.

Finally, I am really curious about performance. If we take a DFA as it is executed today as baseline, and compare to this new NFA path (running a DFA), what is the overhead?

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

mikemccand · 2021-07-27T15:44:27Z

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

+    if (stateSet.size() == 0) {
+      return null;
+    }
+    return new DState(stateSet.getArray());


We can't just use StateSet.freeze here?

lucene/core/src/java/org/apache/lucene/util/automaton/MinimizationOperations.java

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

rmuir · 2021-07-28T12:31:20Z

This is super exciting! I'm amazed how little code you needed to get this first version running.

but a runautomaton for this won't run any queries on its own: brute force isn't how these queries actually work. the important part is the intersection (skipping around)...

I suggest, please let's not try to "overshare" and refactor all this stuff alongside DFA stuff until there is a query we can actually benchmark to see if the performance is even viable

mikemccand · 2021-07-28T12:44:59Z

I suggest, please let's not try to "overshare" and refactor all this stuff alongside DFA stuff until there is a query we can actually benchmark to see if the performance is even viable

OK yeah +1 to keep it wholly separate (full fork) for now until we learn more how this NFARegexpQuery behaves.

mikemccand

This looks amazing! I think it is close!

I love how you were able to slightly abstract out the RunAutomaton methods so that NFARunAutomaton could be swapped in instead, and no change is needed to AutomatonQuery.

But I think checking for determinizeWorkLimit==0 way down super low inside determinize is a bit too low level ...

I'm super eager to see how performance compares for NFA execution method (determinize lazily on demand) for AutomatonQuery versus the current "determinize up front" DFA approach.

lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java

lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

lucene/core/src/java/org/apache/lucene/util/automaton/Stepable.java

lucene/core/src/java/org/apache/lucene/util/automaton/TransitionAccessor.java

mikemccand · 2021-08-23T14:54:52Z

lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java

+        }
+      }
+      Automaton dfa = regExp.toDFA();
+      NFARunAutomaton candidate = new NFARunAutomaton(regExp.toNFA());


Another maybe powerful way to test NFA behavior would be to create any random DFA (e.g. make random set of strings and call that Daciuk/Mihov builder, or random RegExp.toDFA() like here, or maybe even randomly construct something state by state and transition by transition), then create a new test-only method that converts any DFA back into an NFA by randomly picking a DFA state and duplicating it (preserve all incoming and leaving transitions), or maybe by generating N strings accepted by the DFA and unioning them back into it. Do that N times so N states get duplicated. This should not alter the language accepted by the automaton, but should make it very "N".

Finally, from the DFA, randomly enumerate strings it accepts, and then assert the NFARunAutomaton also accepts them. And, sometimes randomly generate random strings that are not accepted by the DFA, and confirm the NFA also does not accept them.

mikemccand · 2021-08-23T15:01:38Z

lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java

+        continue;
+      }
+      Query dfaQuery = new AutomatonQuery(new Term(FIELD), a);
+      Query nfaQuery = new AutomatonQuery(new Term(FIELD), a, 0);


It's very cool you are able to completely reuse AutomatonQuery to run either an NFA or DFA by using determinizeWorkLimit = 0 to mean "create an NFA", since you made the two interfaces. Versus forking and then requiring the user make an NFAAutomatonQuery.

But I think it's a little dangerous to take such a low-level approach? (Your // nocommit above). Because that approach means anyone who calls determinize with a 0 work limit will quietly get an NFA back, not just AutomatonQuery who knows how to handle the NFA properly. I.e. other places that call determinize expect to always get back a deterministic automaton? Maybe, instead, we could instead just change AutomatonQuery to take an optional determinizeUpFront boolean, which would default to true (preserving current behavior)?

Or, maybe keep the determinizeWorkLimit==0 to mean "use NFA", but move that if up from way down in determinize to higher up in AutomatonQuery?

mikemccand

I like this change! I think it is close.

I would love to know how this new "determinize on demand" performs, but we have to fix luceneutil to count Query init cost, and then find some "realistic" regexps to test.

But since we are not changing the behavior of AutomatonQuery (it still determinizes up-front by default), I don't think we need to block pushing this (once we iterate on all feedback) on benchmark results?

...ckward-codecs/src/java/org/apache/lucene/backward_codecs/lucene40/blocktree/FieldReader.java

lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java

mikemccand · 2021-09-27T15:48:23Z

lucene/codecs/src/java/org/apache/lucene/codecs/memory/DirectPostingsFormat.java

-        runAutomaton = compiled.runAutomaton;
-        compiledAutomaton = compiled;
+        if (compiled.nfaRunAutomaton != null) {
+          this.runAutomaton = compiled.nfaRunAutomaton;


Maybe instead of having separate nfaRunAutomaton and runAutomaton we could have only runAutomaton and a separate CompiledAutomaton.isDeterminized boolean?

I'm hesitating on doing that since there might be some method still rely on the truth that runAutomaton is of type RunAutomaton but not only using the method abstracted by ByteRunnable.

Also it might be a bit easier to spot if the nfaRunAutomaton is having problem at this stage? Probably we want to merge it after the NFA one is more mature?

OK we can wait on this. Maybe just add a comment explaining why we need this odd if still, here and in the other places where we did this.

I too would really prefer it if we can avoid the current "if". It starts to look like a "dance" in all the places where we do it. I don't understand about how it makes debugging easier, can't we just print automaton.isDeterministic() ?

Hmmm I checked it again and it's a bit complex to merge the runAutomaton and nfaRunAutomaton into one, because previously runAutomaon is having public access and kind of wildly used, it's type directly appears in some public API such as QueryVisitor#consumesTermsMatching, so it might take another PR to try to merge them, I left a todo for now.

An alternative way I'm thinking about is to move those if inside the CompiledAutomaton, and only expose methods like getByteRunnable so that people don't manipulate them outside, I'll include that in next commit.

mikemccand · 2021-09-29T12:26:07Z

lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java

+   *     UTF32ToUTF8 conversion
+   * @param runnableType NFA or DFA
+   */
+  public AutomatonQuery(


Cool, so the existing ctor remains, defaulting to DFA execution strategy, where the automaton is first fully determinized.

But now you add another ctor, letting users also ask for NFA execution, where the automaton is determinized lazily on-demand and only in those parts that the terms in this index need to visit.

lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunnable.java

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java

mikemccand · 2021-09-29T14:20:08Z

lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java

+        continue;
+      }
+      AutomatonQuery dfaQuery = new AutomatonQuery(new Term(FIELD), a);
+      AutomatonQuery nfaQuery = new AutomatonQuery(new Term(FIELD), a, ByteRunnable.TYPE.NFA);


Could you add a new LuceneTestCase method, newAutomatonQuery, and it would randomly pick between NFA and DFA type? And then in a few pre-existing tests, let's call newAutomatonQuery instead of new AutomatonQuery?

That method could also randomly make the automaton non-deterministic by simple cloning a few states? We can do this in a follow-on issue.

Good idea! Let's delay it a bit to the next PR maybe (I plan to introduce NFARegexQuery in next PR probably).

lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java

lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java

mikemccand

I think this is getting really close! Thanks @zhaih!

mikemccand · 2021-10-08T13:35:45Z

lucene/codecs/src/java/org/apache/lucene/codecs/memory/DirectPostingsFormat.java

-        runAutomaton = compiled.runAutomaton;
-        compiledAutomaton = compiled;
+        if (compiled.nfaRunAutomaton != null) {
+          this.runAutomaton = compiled.nfaRunAutomaton;


OK we can wait on this. Maybe just add a comment explaining why we need this odd if still, here and in the other places where we did this.

...ckward-codecs/src/java/org/apache/lucene/backward_codecs/lucene40/blocktree/FieldReader.java

mikemccand · 2021-10-08T13:40:44Z

lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunnable.java

+    /**
+     * Determinize the automaton lazily on-demand as terms are intersected. This option saves the
+     * up-front determinize cost, and can handle some RegExps that DFA cannot, but intersection will
+     * be a bit slower


Missing period at the end of the sentence?

Maybe link to Russ Cox's famous page (https://swtch.com/~rsc/regexp/regexp1.html) and point out that this is similar to the Thompson NFA approach described there?

Oh this page is linked in NFARunAutomaton's javadoc already :)

mikemccand · 2021-10-08T13:41:14Z

lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunnable.java

+public interface ByteRunnable {
+
+  /** NFA or DFA */
+  enum TYPE {


Could you add @lucene.experimental to the TYPE javadocs, and on each of the options (NFA, DFA)?

Also, could you pull out this enum into its own class under o.a.l.util.automaton, maybe AutomatonExecutionMode or AutomatonIntersectionMode or AutomatonExecutionStrategy (getting longish to type...)? I think it is too obscure living down inside this non-consumable (to Lucene users who don't know all sorts of details about automata) ByteRunnable now.

Good idea, I renamed it to RunAutomatonMode. How's that sounds like?

lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java

zhaih · 2021-10-12T17:02:10Z

Hmmm one of the test failed

ERROR: The following test(s) have failed:
> Task :lucene:analysis:smartcn:test
  - org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef (:lucene:core)
:lucene:analysis:smartcn:test (SUCCESS): 21 test(s)
    Test output: /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexFileDeleter.txt

    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef" -Ptests.jvms=1 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=FD14DA9475FFAE2C -Ptests.slow=false -Ptests.file.encoding=UTF-8

But I can't reproduce locally...

dweiss · 2021-10-12T19:01:42Z

The stack trace is interesting though - looks like access to a closed reader pool:

org.apache.lucene.index.TestIndexFileDeleter > testExcInDecRef FAILED
    org.apache.lucene.store.AlreadyClosedException: ReaderPool is already closed
        at __randomizedtesting.SeedInfo.seed([FD14DA9475FFAE2C:1489ADA6033649D1]:0)
        at app//org.apache.lucene.index.ReaderPool.get(ReaderPool.java:400)
        at app//org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3742)
        at app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:590)
        at app//org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:474)
        at app//org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:406)
        at app//org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef(TestIndexFileDeleter.java:484)

dweiss · 2021-10-12T19:03:08Z

I re-ran the jobs. The stack trace from the error does look suspicious though.

zhaih · 2021-10-14T00:50:07Z

Thanks @dweiss seems this is not the first time we see this error: https://issues.apache.org/jira/browse/LUCENE-9839

mikemccand · 2021-11-19T16:21:32Z

Thanks @dweiss seems this is not the first time we see this error: https://issues.apache.org/jira/browse/LUCENE-9839

Looks like this is (scarily!) pre-existing. I don't think it should block this change, and I cannot see anything here that might cause that scary exception.

mikemccand

Sorry for the slow reply here! The change looks awesome @zhaih. It defaults execution the current approach (DFA for up front determinize) but opens the option to do NFA instead, which determinizes on demand. I left a tiny comment but otherwise I think this is ready. I think we should target 9.x / main for this.

mikemccand · 2021-11-19T16:22:04Z

lucene/CHANGES.txt


 New Features

+* LUCENE-10010 Introduce NFARunAutomaton to run NFA directly. (Haoyu Zhai)


Maybe change to Patrick Zhai for consistency?

Ah nice catch!

rmuir · 2021-11-29T07:45:28Z

But since we are not changing the behavior of AutomatonQuery (it still determinizes up-front by default), I don't think we need to block pushing this (once we iterate on all feedback) on benchmark results?

Can we at least run existing benchmarks to confirm it doesn't introduce regressions? Some of the code is performance sensitive and we are adding abstractions here. damn java.

I'm still confused about the API, as it adds lots of booleans/enums. Do we really need enums, or can a simple boolean suffice? Does AutomatonQuery really need such boolean, or should it just look at Automaton.isDeterministic to determine what to do? Depending on what performance shows, why even keep around a both determinizeWorkLimit and the booleans/enums? Instead of throwing an exception, why not fall back to NFA if you really want to run some crazy regexp?

rmuir · 2021-11-29T09:28:11Z

I made a quick prototype with what i mean for the API: #485

The idea is that AutomatonQuery shouldn't be determinizing. Let's push this to the caller. If they pass it a DFA, it uses DFA algorithm. If they pass it NFA, it can use the NFA algorithm (it currently throws an exception in my branch, instead of slowly determinizing, that is the change).

zhaih · 2021-11-29T23:24:43Z

Thanks @rmuir, I'll run a benchmark to ensure this PR does not introduce regression recently.

I like the approach you proposed in #485, it would be nice if we can get rid of determinizeWorkLimit in some classes that previously exists everywhere. One reason for carrying an enum and the determinizeWorkLimit together is that we might want to use that determinizeWorkLimit to limit the number of state that NFA can cache as well. But that's a feature not implemented yet and could be done in some other ways.

I think we can try to get that pushed and then I can rebase this one after.

zhaih · 2021-12-02T22:29:47Z

@rmuir I've done the benchmark! (Sorry for the delay): and result looks plain (which is good for this change)

                    Task    QPS base      StdDev    QPS cand      StdDev                Pct diff p-value
            OrNotHighMed      523.45      (3.0%)      518.00      (1.7%)   -1.0% (  -5% -    3%) 0.176
                HighTerm     1385.51      (1.9%)     1374.10      (2.5%)   -0.8% (  -5% -    3%) 0.232
                  Fuzzy2       35.19      (5.9%)       34.91      (5.6%)   -0.8% ( -11% -   11%) 0.666
   HighTermDayOfYearSort      779.01      (2.5%)      772.93      (2.4%)   -0.8% (  -5% -    4%) 0.315
                 Respell       36.33      (1.3%)       36.05      (1.6%)   -0.8% (  -3% -    2%) 0.086
        HighSloppyPhrase        5.08      (6.2%)        5.04      (6.1%)   -0.8% ( -12% -   12%) 0.700
              AndHighLow      349.64      (3.1%)      347.17      (3.5%)   -0.7% (  -7% -    6%) 0.501
       HighTermMonthSort       37.82      (7.0%)       37.55      (6.9%)   -0.7% ( -13% -   14%) 0.751
                 LowTerm     1041.47      (2.5%)     1034.72      (2.1%)   -0.6% (  -5% -    3%) 0.367
            OrNotHighLow      558.55      (2.0%)      555.05      (2.3%)   -0.6% (  -4% -    3%) 0.358
              TermDTSort      149.25      (3.7%)      148.36      (3.3%)   -0.6% (  -7% -    6%) 0.592
         LowSloppyPhrase       46.63      (3.7%)       46.37      (3.9%)   -0.6% (  -7% -    7%) 0.647
           OrHighNotHigh      454.25      (1.6%)      452.05      (1.6%)   -0.5% (  -3% -    2%) 0.341
         MedSloppyPhrase       22.73      (2.4%)       22.63      (3.0%)   -0.4% (  -5% -    5%) 0.627
             AndHighHigh       20.19      (2.4%)       20.11      (2.0%)   -0.4% (  -4% -    4%) 0.564
              AndHighMed       42.22      (2.1%)       42.05      (2.0%)   -0.4% (  -4% -    3%) 0.549
            OrHighNotLow      578.56      (1.7%)      576.35      (2.8%)   -0.4% (  -4% -    4%) 0.599
    HighIntervalsOrdered       11.48      (4.0%)       11.44      (3.9%)   -0.4% (  -7% -    7%) 0.759
    HighTermTitleBDVSort        9.27      (2.8%)        9.23      (2.7%)   -0.4% (  -5% -    5%) 0.664
                 MedTerm      963.30      (2.1%)      959.98      (1.5%)   -0.3% (  -3% -    3%) 0.544
                Wildcard       83.18      (4.2%)       82.91      (4.4%)   -0.3% (  -8% -    8%) 0.811
     LowIntervalsOrdered       11.41      (3.2%)       11.38      (2.8%)   -0.3% (  -6% -    5%) 0.768
     MedIntervalsOrdered       16.77      (2.6%)       16.72      (2.3%)   -0.3% (  -5% -    4%) 0.722
            HighSpanNear        1.06      (1.0%)        1.05      (1.3%)   -0.2% (  -2% -    2%) 0.506
             LowSpanNear        3.06      (1.1%)        3.05      (1.4%)   -0.2% (  -2% -    2%) 0.599
BrowseDayOfYearSSDVFacets        2.88      (2.5%)        2.87      (2.6%)   -0.2% (  -5% -    5%) 0.827
           OrNotHighHigh      466.02      (2.3%)      465.36      (1.9%)   -0.1% (  -4% -    4%) 0.828
    BrowseDateTaxoFacets        0.84      (5.3%)        0.84      (4.4%)   -0.1% (  -9% -   10%) 0.933
                  IntNRQ       28.10      (0.7%)       28.08      (1.2%)   -0.1% (  -1% -    1%) 0.762
BrowseDayOfYearTaxoFacets        0.84      (5.3%)        0.84      (4.4%)   -0.1% (  -9% -   10%) 0.957
   BrowseMonthSSDVFacets        2.99      (3.5%)        2.99      (3.7%)   -0.1% (  -7% -    7%) 0.962
   BrowseMonthTaxoFacets        0.87      (5.6%)        0.87      (4.7%)   -0.0% (  -9% -   10%) 0.979
               OrHighLow      373.02      (3.0%)      372.89      (2.7%)   -0.0% (  -5% -    5%) 0.968
                PKLookup      131.84      (2.4%)      131.83      (2.9%)   -0.0% (  -5% -    5%) 0.992
                 Prefix3       93.52      (3.2%)       93.51      (3.5%)   -0.0% (  -6% -    6%) 0.997
              OrHighHigh       30.69      (2.1%)       30.69      (2.1%)    0.0% (  -4% -    4%) 0.989
             MedSpanNear       23.34      (1.3%)       23.35      (1.2%)    0.0% (  -2% -    2%) 0.958
                  Fuzzy1       44.19      (7.7%)       44.20      (8.2%)    0.0% ( -14% -   17%) 0.991
              HighPhrase       89.39      (2.0%)       89.43      (1.9%)    0.0% (  -3% -    4%) 0.943
               LowPhrase       27.84      (2.0%)       27.88      (2.6%)    0.1% (  -4% -    4%) 0.853
               OrHighMed       31.74      (1.9%)       31.80      (1.8%)    0.2% (  -3% -    3%) 0.763
               MedPhrase       82.20      (1.7%)       82.54      (2.4%)    0.4% (  -3% -    4%) 0.520
            OrHighNotMed      501.51      (1.7%)      505.78      (2.3%)    0.9% (  -3% -    4%) 0.181

… IntersectTermsEnum

…tomaton

…to "toDFA" since that changes too many files

…imit == 0

sonatype-lift · 2021-12-03T02:00:38Z

lucene/core/src/java/org/apache/lucene/util/automaton/NFARunAutomaton.java

+    }
+
+    @Override
+    public boolean equals(Object o) {


EqualsGetClass: Overriding Object#equals in a non-final class by using getClass rather than instanceof breaks substitutability of subclasses. (details)
(at-me in a reply with help or ignore)

@sonatype-lift ignore

I've recorded this as ignored for this pull request. If you change your mind, just comment @sonatype-lift unignore.

rmuir · 2021-12-04T05:36:27Z

ok, see #513 which is another PR like the #485, just for RegExp class. all the trappy minimization is removed, it may return a DFA or NFA: and it is the callers choice to minimize. Combined with #485 I think it gives us the opportunity for a simple API, e.g. this stuff would just happen in one place (e.g. RegExpQuery class) rather than strewn all about.

rmuir · 2021-12-08T02:57:41Z

ok, sorry, I realize the latest two PR-split-outs still don't solve your problem. The API is still ugly because we still minimize() regexps, and that implies determinize()... but maybe isn't obvious.

I don't think we need to minimize() any more, I opened a separate discussion: https://issues.apache.org/jira/browse/LUCENE-10296. I will try to prototype a PR.

Instead, the decisions on this issue should just be about DFA or NFA!

rmuir · 2021-12-08T03:29:37Z

ok one more, but I think it sets us up even better: #528

lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java

rmuir · 2021-12-09T18:32:53Z

lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java

+        try {
+          regExp = new RegExp(AutomatonTestUtil.randomRegexp(random()));
+        } catch (IllegalArgumentException e) {
+          ignoreException(e);


You can also annotate the IllegalArgumentException with @SuppressWarnings("unused").

as far as the random regexp, I think it already does what you want? One tricky thing is, the strings it generates are only valid if you then build regexp with RegExp.NONE. That is probably what causes confusion here.

rmuir · 2021-12-09T18:35:18Z

lucene/codecs/src/java/org/apache/lucene/codecs/memory/DirectPostingsFormat.java

-        runAutomaton = compiled.runAutomaton;
-        compiledAutomaton = compiled;
+        if (compiled.nfaRunAutomaton != null) {
+          this.runAutomaton = compiled.nfaRunAutomaton;


I too would really prefer it if we can avoid the current "if". It starts to look like a "dance" in all the places where we do it. I don't understand about how it makes debugging easier, can't we just print automaton.isDeterministic() ?

zhaih · 2021-12-20T19:56:17Z

OK @rmuir some new commits are ready to be reviewed! Please take your time :)

rmuir

Thank you for all the hard work and patience here @zhaih !

zhaih · 2021-12-20T22:39:46Z

Thank you @rmuir and @mikemccand for reviewing this big PR! I'll merge it myself :)

zhaih force-pushed the main branch 2 times, most recently from 3f72684 to abe3ab1 Compare July 26, 2021 22:19

mikemccand reviewed Jul 27, 2021

View reviewed changes

zhaih force-pushed the main branch from f348c5c to 63eb065 Compare August 20, 2021 02:41

mikemccand reviewed Aug 23, 2021

View reviewed changes

zhaih marked this pull request as ready for review September 21, 2021 15:33

mikemccand reviewed Sep 29, 2021

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java Outdated Show resolved Hide resolved

mikemccand reviewed Oct 8, 2021

View reviewed changes

zhaih force-pushed the main branch from 161dcbd to 879bf5c Compare October 12, 2021 04:34

mikemccand approved these changes Nov 28, 2021

View reviewed changes

zhaih mentioned this pull request Nov 29, 2021

LUCENE-10010: don't determinize in CompiledAutomaton/RunAutomaton #485

Merged

zhaih added 7 commits December 2, 2021 15:14

LUCENE-10010 Introduce NFARunAutomaton to run NFA directly

00692b7

Add more javadocs

b426b19

Address comments & define next step interfaces for incorporating with…

50def56

… IntersectTermsEnum

Hacking through the IntersectTermEnum to make it able to use NFARunAu…

6c367bc

…tomaton

Add classmap optimization

9957378

NFAQuery now supported by all codecs

34f5446

Improve CompiledAutomaton interface, revert changes of "toAutomaton" …

3d0ce11

…to "toDFA" since that changes too many files

zhaih added 7 commits December 2, 2021 15:14

Fix RegExp nocommit

85944fd

Javadoc fix

5bdb78f

Add NFA/DFA enum to AutomatonQuery, instead of using determinizeWorkL…

96725ce

…imit == 0

better documentation

d774966

More javadoc update, adding lucene experimental

b96d8d9

Fix assertion error in unit test, rename some variables

3f63d6e

Change CHNAGES.txt entry to 9.1, also change author name

a38ae57

zhaih force-pushed the main branch from 4f4f79b to a38ae57 Compare December 2, 2021 23:15

sonatype-lift bot reviewed Dec 3, 2021

View reviewed changes

zhaih mentioned this pull request Dec 5, 2021

LUCENE-10010: don't determinize/minimize in RegExp #513

Merged

zhaih added 4 commits December 8, 2021 15:21

Merge remote-tracking branch 'upstream/main' into main

d5e5608

Remove RunAutomatonMode

6f1115c

Remove test that fail with nfa

82a2039

Merge remote-tracking branch 'upstream/main' into main

eff0d44

rmuir requested changes Dec 9, 2021

View reviewed changes

zhaih added 2 commits December 20, 2021 11:41

add getTransitionAccessor and getByteRunnable to CompiledAutomaton

55fa135

Merge remote-tracking branch 'upstream/main' into main

41d596f

Javadoc for new methods

c50f6ab

rmuir approved these changes Dec 20, 2021

View reviewed changes

zhaih merged commit c972c6a into apache:main Dec 20, 2021

asfimport mentioned this pull request Dec 21, 2021

Should we have a NFA Query? [LUCENE-10010] #11049

Open


		New Features

		* LUCENE-10010 Introduce NFARunAutomaton to run NFA directly. (Haoyu Zhai)

Conversation

zhaih commented Jul 26, 2021

Description

Tests

Checklist

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmuir commented Jul 28, 2021

Uh oh!

mikemccand commented Jul 28, 2021

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dweiss commented Oct 12, 2021 •

edited

Loading