[ML] Fixing categorization token highlighting for multi-line messages#103007
Conversation
|
Pinging @elastic/ml-ui (:ml) |
Just to clarify, elastic/elasticsearch#73828 means people using the default categorization analyzer would have seen the problem from 7.14, but it's always been a bug that if you had a I am not suggesting that we backport this to 7.13, but I do think it should be release noted as a bug fix, just in case someone ever observes the effect on an older version. |
|
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…3007) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
Fixes issue introduced in elastic/elasticsearch#73828
Multi-lined messages are now no longer lost after the end of the last matched token.
Issue as described by @droberts195
The problem arises when there’s a token that ends at the end of the first line of the message.
Because the first_non_blank_line char filter deletes everything after it, that token is reported as ending at the very end of the original message, even though it’s short.
Then, in the highlighting, the UI replaces the last token on the first line plus all the other lines with the single short token.
Thus making it look like the second and subsequent lines never existed
Solution is to base our
end_offseton the token length, rather than the suppliedend_offsetfrom the analyze endpoint