Add support for fragment_length in the unified highlighter#23431
Add support for fragment_length in the unified highlighter#23431jimczi merged 2 commits intoelastic:masterfrom jimczi:bounded_break_iterator
Conversation
nik9000
left a comment
There was a problem hiding this comment.
Left a few minors. As I said, I'm not a fan of using BreakIterator for something so narrow but this is a big improvement so it is worth getting in regardless.
There was a problem hiding this comment.
I really don't like BreakIterator for this. It is a much wider interface than the highlighter needs. But it is what we have to work with. So we shall!
There was a problem hiding this comment.
Should these be hard checks? Or have a message associated with them?
There was a problem hiding this comment.
Yeah, I'd make these a hard check and throw a new IllegalArgumentException("Usage doesn't look like UnifiedHighlighter").
There was a problem hiding this comment.
It is probably worth commenting why this is needed too.
There was a problem hiding this comment.
If I reading this right, this is the only change to the file. Everything else is just word wrapping?
There was a problem hiding this comment.
Can you add a non-random test just so it is easier to see? One with a "normal" sentence and maybe one with a long one? Just for illustration purposes.
There was a problem hiding this comment.
Ah, you have enough in CustomUnifiedHighlighterTests, I think.
There was a problem hiding this comment.
I'd throw an exception if it is > 0 rather than ignore it.
There was a problem hiding this comment.
The default value is 100 so it feels weird to set an explicit fragment_length:0 to make this mode works ?
This commit introduce a new break iterator (a BoundedBreakIterator) designed for the unified highlighter that is able to limit the size of fragments produced by generic break iterator like `sentence`. The `unified` highlighter now supports `boundary_scanner` which can `words` or `sentence`. The `sentence` mode will use the bounded break iterator in order to limit the size of the sentence to `fragment_length`. When sentences bigger than `fragment_length` are produced, this mode will break the sentence at the next word boundary **after** `fragment_length` is reached.
|
Thanks @nik9000 |
* Add support for fragment_length in the unified highlighter This commit introduce a new break iterator (a BoundedBreakIterator) designed for the unified highlighter that is able to limit the size of fragments produced by generic break iterator like `sentence`. The `unified` highlighter now supports `boundary_scanner` which can `words` or `sentence`. The `sentence` mode will use the bounded break iterator in order to limit the size of the sentence to `fragment_length`. When sentences bigger than `fragment_length` are produced, this mode will break the sentence at the next word boundary **after** `fragment_length` is reached.
* master: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
* master: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
* single-node-discovery: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
This commit introduce a new break iterator (a BoundedBreakIterator) designed for the
unifiedhighlighter that is able to limit the size of fragments produced by generic break iterator likesentence.The
unifiedhighlighter now supportsboundary_scannerwith the following mode:wordsandsentence. Thesentencemode uses the new bounded break iterator in order to limit the size of the sentence tofragment_length.When sentences bigger than
fragment_lengthare produced, this mode will break the sentence at the next word boundary afterfragment_lengthis reached.