#12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton#12310
#12276: rename DaciukMihovAutomatonBuilder to StringsToAutomaton#12310mikemccand merged 2 commits intoapache:mainfrom
Conversation
gsmiller
left a comment
There was a problem hiding this comment.
LGTM. Apologies for the merge conflict I created for you (but thanks for the review on that PR!).
|
Yeah we should explore a binary version. Even if it doesn't speedup TermInSetQuery. TermInSetQuery has/had a super trappy visitor method that builds an automaton from it's sorted terms in a very slow way for some visitor method. Pretty sure I added a comment along the lines of "we should not do this". We could at least fix that trap. Check my assumptions and memory as I am on a phone |
No worries! I'll resolve and merge soon! Thanks for the review @gsmiller. |
+1
Oh yeah! I think you are referring to this super scary code! We should indeed make a |
|
Actually, the terms must be sorted in Unicode code point order, and, we do have the builder for |
|
@mikemccand where's the build method you're referencing? I took a pass at creating a "direct to binary" version of the Daciuk-Mihov algorithm in #12312. We could fold that into this work if we want? Or treat it as a separate follow-up task. But maybe there's already some direct binary building logic I wasn't aware of and my changes aren't needed? Edit: Ah, never mind @mikemccand. Found the method you're referencing. |
|
@mikemccand the |
|
Separately, in terms of renaming this class, I'm still in favor of it but I also wonder if we should just consider making it pkg-private and proxying the build functionality through |
This is just a rote rename of this helpful class. I plan 10.0 only since it is technically a public API break. I added a line to MIGRATE.txt too.
Closes #12276