[ML] Fixing categorization token highlighting for multi-line messages by jgowdyelastic · Pull Request #103007 · elastic/kibana

jgowdyelastic · 2021-06-22T20:32:47Z

Fixes issue introduced in elastic/elasticsearch#73828

Multi-lined messages are now no longer lost after the end of the last matched token.

Issue as described by @droberts195
The problem arises when there’s a token that ends at the end of the first line of the message.
Because the first_non_blank_line char filter deletes everything after it, that token is reported as ending at the very end of the original message, even though it’s short.
Then, in the highlighting, the UI replaces the last token on the first line plus all the other lines with the single short token.
Thus making it look like the second and subsequent lines never existed

Solution is to base our end_offset on the token length, rather than the supplied end_offset from the analyze endpoint

elasticmachine · 2021-06-22T20:32:49Z

Pinging @elastic/ml-ui (:ml)

droberts195

LGTM

droberts195 · 2021-06-23T09:13:03Z

Fixes issue introduced in elastic/elasticsearch#73828

Just to clarify, elastic/elasticsearch#73828 means people using the default categorization analyzer would have seen the problem from 7.14, but it's always been a bug that if you had a char_filter that deleted the end of the original message and had a token that was right at the end of the truncated message then this problem will have occurred. The difference is that in 7.13 and earlier the default categorization analyzer didn't contain a char_filter, so only advanced users who defined their own categorization analyzer would see this.

I am not suggesting that we backport this to 7.13, but I do think it should be release noted as a bug fix, just in case someone ever observes the effect on an older version.

peteharverson

LGTM

jgowdyelastic · 2021-06-29T06:13:00Z

@elasticmachine merge upstream

…e-messages

kibanamachine · 2021-06-29T08:20:57Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💚 Build #133338 succeeded 0048bc8

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

…3007) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

kibanamachine · 2021-06-29T09:31:45Z

💚 Backport successful

Status	Branch	Result
✅	7.x

This backport PR will be merged automatically after passing CI.

…103627) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: James Gowdy <jgowdy@elastic.co>

[ML] Fixing categorization tokens for multi-line messages

0048bc8

jgowdyelastic added bug Fixes for quality problems that affect the customer experience review :ml Feature:Anomaly Detection ML anomaly detection v8.0.0 v7.14.0 labels Jun 22, 2021

jgowdyelastic requested review from droberts195 and peteharverson June 22, 2021 20:32

jgowdyelastic self-assigned this Jun 22, 2021

jgowdyelastic requested a review from a team as a code owner June 22, 2021 20:32

droberts195 approved these changes Jun 23, 2021

View reviewed changes

droberts195 changed the title ~~[ML] Fixing categorization tokens for multi-line messages~~ [ML] Fixing categorization token highlighting for multi-line messages Jun 23, 2021

droberts195 added the release_note:fix label Jun 23, 2021

peteharverson approved these changes Jun 23, 2021

View reviewed changes

Merge branch 'master' into fixing-categorization-tokens-for-multi-lin…

2e65999

…e-messages

jgowdyelastic added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 29, 2021

jgowdyelastic merged commit 824463a into elastic:master Jun 29, 2021

kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 29, 2021

[ML] Fixing categorization tokens for multi-line messages (elastic#10…

0568275

…3007) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

kibanamachine mentioned this pull request Jun 29, 2021

[7.x] [ML] Fixing categorization tokens for multi-line messages (#103007) #103627

Merged

jgowdyelastic deleted the fixing-categorization-tokens-for-multi-line-messages branch July 19, 2021 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Fixing categorization token highlighting for multi-line messages#103007

[ML] Fixing categorization token highlighting for multi-line messages#103007
jgowdyelastic merged 2 commits intoelastic:masterfrom
jgowdyelastic:fixing-categorization-tokens-for-multi-line-messages

jgowdyelastic commented Jun 22, 2021 •

edited

Loading

Uh oh!

elasticmachine commented Jun 22, 2021

Uh oh!

droberts195 left a comment

Uh oh!

droberts195 commented Jun 23, 2021

Uh oh!

peteharverson left a comment

Uh oh!

jgowdyelastic commented Jun 29, 2021

Uh oh!

kibanamachine commented Jun 29, 2021

Uh oh!

kibanamachine commented Jun 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jgowdyelastic commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jun 22, 2021

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

droberts195 commented Jun 23, 2021

Uh oh!

peteharverson left a comment

Choose a reason for hiding this comment

Uh oh!

jgowdyelastic commented Jun 29, 2021

Uh oh!

kibanamachine commented Jun 29, 2021

💚 Build Succeeded

Metrics [docs]

History

Uh oh!

kibanamachine commented Jun 29, 2021

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jgowdyelastic commented Jun 22, 2021 •

edited

Loading