Allow plugins to register pre-configured tokenizers#24751
Merged
nik9000 merged 8 commits intoelastic:masterfrom May 19, 2017
Merged
Allow plugins to register pre-configured tokenizers#24751nik9000 merged 8 commits intoelastic:masterfrom
nik9000 merged 8 commits intoelastic:masterfrom
Conversation
one test wasn't really testing anything even though it looked like it was. It tried to test caching but it failed at it. I'm not actually sure what good the caching provides. Adds unit test for registering tokenizers via plugins.
rjernst
approved these changes
May 18, 2017
| /** | ||
| * Shared implementation for pre-configured analysis components. | ||
| */ | ||
| abstract class PreConfiguredAnalysisComponent<T> implements AnalysisModule.AnalysisProvider<T> { |
Member
There was a problem hiding this comment.
Seems like this should be public since they should not be required to be under the o.e.i.analysis package?
| tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, () -> new TokenFilterFactory() { | ||
| @Override | ||
| public String name() { | ||
| return "lowercase"; |
Member
There was a problem hiding this comment.
I wonder if (as a followup) name() could be removed from TokenFilterFactory? It seems to only have 3 uses, one of which is a test. Then TokenFilterFactory could be a functional interface and this would look much cleaner:
tokenizers.add(PreConfiguredTokenizer.singleton("lowercase", LowerCaseTokenizer::new, LowerCaseTokenFilter::new));
Member
Author
There was a problem hiding this comment.
Yes! Let me have a look later on.
We have an issue tracking it.
75 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allows plugins to register pre-configured tokenizers. Much of the decisions are the same as those in #24223, #24572, and #24223. This only migrates the
lowercasetokenizer but I figure that is a good start because it proves out the features.