Skip to content

[7.17.1] Adjust indices.recovery.max_bytes_per_sec according to external settings#83413

Merged
elasticsearchmachine merged 8 commits intoelastic:7.17from
tlrx:backport-node-bandwidth-recovery-settings-on-7.17.1
Feb 9, 2022
Merged

[7.17.1] Adjust indices.recovery.max_bytes_per_sec according to external settings#83413
elasticsearchmachine merged 8 commits intoelastic:7.17from
tlrx:backport-node-bandwidth-recovery-settings-on-7.17.1

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Feb 2, 2022

This pull request backports the node bandwidth settings merged in 8.1.0.

It cherry-picks the following changes:

The commit e235fea contains the changes for 7.17.1:

  • the user factors settings node.bandwidth.recovery.factor.read and node.bandwidth.recovery.factor.write are removed to only keep the operator variants set by the platform
  • the operator variants are made static (non dynamic and non-operator only)
  • a warning in the documentation to warn about the recovery settings not being available in 8.0.0

tlrx and others added 5 commits February 2, 2022 14:44
Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of elastic#82819 for 7.17.1
…tic#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates elastic#82819
@tlrx tlrx added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. backport v7.17.1 labels Feb 2, 2022
@tlrx tlrx marked this pull request as ready for review February 4, 2022 08:30
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Feb 4, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -0,0 +1,6 @@
pr: 82819
summary: "[Draft] Adjust `indices.recovery.max_bytes_per_sec` according to external\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove [Draft] here (and preferably in 8.1 too)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #83527 for 8.1, thanks for catching this

pr: 83350
summary: Add missing max overcommit factor to list of (dynamic) settings
area: Recovery
type: bug
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can remove this changelog entry?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 9, 2022
@elasticsearchmachine elasticsearchmachine merged commit 07b9951 into elastic:7.17 Feb 9, 2022
@tlrx tlrx deleted the backport-node-bandwidth-recovery-settings-on-7.17.1 branch February 9, 2022 11:33
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 9, 2022

Thanks Henning and David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed Meta label for distributed team. v7.17.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants