Implement Setting Deduplication via String Interning (#80493) by original-brownbear · Pull Request #80590 · elastic/elasticsearch

original-brownbear · 2021-11-10T10:58:34Z

This is a somewhat crude solution to #78892 that addresses
95%+ of duplicate setting entry memory consumption in large clusters.
The remaining duplicate structures (lists of all the same strings) are
comparatively cheap in their heap consumption.
In heavy benchmarking for #77466 no runtime impact of adding this extra step
to setting creation has been found despite pushing setting creation harder
than is expected in real-world usage (part of the low relative impact here is
the fact that populating a tree-map is quite expensive to begin with so adding
the string interning which is fast via the CHM cache doesn't add much overhead).
On the other hand, the heap use impact for use-cases that come with a large number
of duplicate settings (many similar indices) is significant. As an example,
10k AuditBeat indices consume about 500M of heap for duplicate settings data structures
without this change. This cahnge brings the heap consumption from duplicate settings down to
O(1M) on every node in the cluster.

Relates and addresses most of #78892
Relates #77466

backport of #80493

This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466

elasticmachine · 2021-11-10T10:58:37Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

original-brownbear added >enhancement :Core/Infra/Settings Settings infrastructure and APIs backport labels Nov 10, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Nov 10, 2021

elasticsearchmachine added the v8.0.0 label Nov 10, 2021

original-brownbear merged commit 63a45e4 into elastic:8.0 Nov 10, 2021

original-brownbear deleted the 80493-8.0 branch November 10, 2021 11:50

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

original-brownbear mentioned this pull request Jan 17, 2022

Implement Setting Deduplication via String Interning (#80493) #82659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Setting Deduplication via String Interning (#80493)#80590

Implement Setting Deduplication via String Interning (#80493)#80590
original-brownbear merged 1 commit intoelastic:8.0from
original-brownbear:80493-8.0

original-brownbear commented Nov 10, 2021

Uh oh!

elasticmachine commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

original-brownbear commented Nov 10, 2021

Uh oh!

elasticmachine commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants