Skip to content

Implement Setting Deduplication via String Interning (#80493)#82659

Merged
original-brownbear merged 1 commit intoelastic:7.17from
original-brownbear:80493-7.x
Jan 17, 2022
Merged

Implement Setting Deduplication via String Interning (#80493)#82659
original-brownbear merged 1 commit intoelastic:7.17from
original-brownbear:80493-7.x

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear commented Jan 17, 2022

This is a somewhat crude solution to #78892 that addresses
95%+ of duplicate setting entry memory consumption in large clusters.
The remaining duplicate structures (lists of all the same strings) are
comparatively cheap in their heap consumption.
In heavy benchmarking for #77466 no runtime impact of adding this extra step
to setting creation has been found despite pushing setting creation harder
than is expected in real-world usage (part of the low relative impact here is
the fact that populating a tree-map is quite expensive to begin with so adding
the string interning which is fast via the CHM cache doesn't add much overhead).
On the other hand, the heap use impact for use-cases that come with a large number
of duplicate settings (many similar indices) is significant. As an example,
10k AuditBeat indices consume about 500M of heap for duplicate settings data structures
without this change. This cahnge brings the heap consumption from duplicate settings down to
O(1M) on every node in the cluster.

Relates and addresses most of #78892
Relates #77466

backport of #80493

This is a somewhat crude solution to #78892 that addresses
95%+ of duplicate setting entry memory consumption in large clusters.
The remaining duplicate structures (lists of all the same strings) are
comparatively cheap in their heap consumption.
In heavy benchmarking for #77466 no runtime impact of adding this extra step
to setting creation has been found despite pushing setting creation harder
than is expected in real-world usage (part of the low relative impact here is
the fact that populating a tree-map is quite expensive to begin with so adding
the string interning which is fast via the CHM cache doesn't add much overhead).
On the other hand, the heap use impact for use-cases that come with a large number
of duplicate settings (many similar indices) is significant. As an example,
10k AuditBeat indices consume about 500M of heap for duplicate settings data structures
without this change. This cahnge brings the heap consumption from duplicate settings down to
O(1M) on every node in the cluster.

Relates and addresses most of #78892
Relates #77466
@original-brownbear original-brownbear added :Core/Infra/Settings Settings infrastructure and APIs backport labels Jan 17, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jan 17, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@original-brownbear original-brownbear merged commit e724fe9 into elastic:7.17 Jan 17, 2022
@original-brownbear original-brownbear deleted the 80493-7.x branch January 17, 2022 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team v7.17.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants