Ukrainian language plugin can fill up heap by romseygeek · Pull Request #71998 · elastic/elasticsearch

romseygeek · 2021-04-21T09:00:26Z

The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary for every shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.

elasticmachine · 2021-04-21T09:00:30Z

Pinging @elastic/es-search (Team:Search)

The lucene Ukrainian analyzer has a bug where a large in-memory dictionary is loaded and stored on a thread local for every tokenstream generated in a new thread (for more details see https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks added in #50908, we create a tokenstream for every registered analyzer in every shard, which means that any node with the ukrainian plugin installed will leak one copy of this dictionary per shard, whether or not the ukrainian analyzer is actually being used. This commit makes the plugin use a fixed version of the UkrainianMorfologikAnalyzer, until we merge a version of lucene that contains the upstream fix.

ppf2 · 2021-05-26T17:09:39Z

@romseygeek Is the version label correct in this PR? It's not listed in the release notes (https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-7.13.0.html). If this didn't make it to 7.13.0, will it be in 7.13.1? Thx!

romseygeek · 2021-05-26T20:42:14Z

Not sure why it's not in the release notes, but it's in the 7.13 release: d6038a3

#71998 was fixed in 7.13.0 but it is missing from the release notes.

ppf2 · 2021-05-26T20:52:47Z

Thx for confirming @romseygeek ! I have filed a doc PR to add it (#73440).

#71998 was fixed in 7.13.0 but was missed in the release notes.

#71998 was fixed in 7.13.0 but was missed in the release notes. Co-authored-by: Pius <pius@elastic.co>

Only load one UK dictionary per JVM

ddd4bcf

romseygeek added >bug :Search Relevance/Analysis How text is split into tokens v8.0.0 v7.13.0 v7.14.0 labels Apr 21, 2021

romseygeek requested a review from jpountz April 21, 2021 09:00

romseygeek self-assigned this Apr 21, 2021

elasticmachine added the Team:Search Meta label for search team label Apr 21, 2021

suppressforbidden

ab23e80

jpountz approved these changes Apr 21, 2021

View reviewed changes

romseygeek merged commit 993f0b0 into elastic:master Apr 21, 2021

romseygeek deleted the bug/ukrainian-analyzer branch April 21, 2021 11:13

ppf2 added a commit that referenced this pull request May 26, 2021

PR 71998 missed in release notes

3b6d497

#71998 was fixed in 7.13.0 but it is missing from the release notes.

ppf2 mentioned this pull request May 26, 2021

[DOCS] Add #71998 to 7.13.0 release notes #73440

Merged

jrodewig pushed a commit that referenced this pull request May 26, 2021

[DOCS] Add #71998 to 7.13.0 release notes

7c64cce

#71998 was fixed in 7.13.0 but was missed in the release notes.

jrodewig mentioned this pull request May 26, 2021

[7.x] [DOCS] Add #71998 to 7.13.0 release notes (#73440) #73443

Merged

jrodewig added a commit that referenced this pull request May 26, 2021

[DOCS] Add #71998 to 7.13.0 release notes (#73443)

79eef1c

#71998 was fixed in 7.13.0 but was missed in the release notes. Co-authored-by: Pius <pius@elastic.co>

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ukrainian language plugin can fill up heap#71998

Ukrainian language plugin can fill up heap#71998
romseygeek merged 2 commits intoelastic:masterfrom
romseygeek:bug/ukrainian-analyzer

romseygeek commented Apr 21, 2021

Uh oh!

elasticmachine commented Apr 21, 2021

Uh oh!

ppf2 commented May 26, 2021

Uh oh!

romseygeek commented May 26, 2021

Uh oh!

ppf2 commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

romseygeek commented Apr 21, 2021

Uh oh!

elasticmachine commented Apr 21, 2021

Uh oh!

ppf2 commented May 26, 2021

Uh oh!

romseygeek commented May 26, 2021

Uh oh!

ppf2 commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants