Skip to content

Add docs for node bandwith settings#83361

Merged
DaveCTurner merged 3 commits intoelastic:masterfrom
DaveCTurner:2022-02-01-bandwidth-settings-docs
Feb 1, 2022
Merged

Add docs for node bandwith settings#83361
DaveCTurner merged 3 commits intoelastic:masterfrom
DaveCTurner:2022-02-01-bandwidth-settings-docs

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

Relates #82819

@DaveCTurner DaveCTurner added >docs General docs changes :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.1.0 labels Feb 1, 2022
@elasticmachine elasticmachine added Team:Docs Meta label for docs team Team:Distributed Meta label for distributed team. labels Feb 1, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Copy Markdown
Member Author

DaveCTurner commented Feb 1, 2022

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit a062bdf into elastic:master Feb 1, 2022
@DaveCTurner DaveCTurner deleted the 2022-02-01-bandwidth-settings-docs branch February 1, 2022 12:19
tlrx pushed a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
tlrx pushed a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…nal settings (#83413)

* Adjust indices.recovery.max_bytes_per_sec according to external settings

Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of #82819 for 7.17.1

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Adjust for 7.17.1

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…al settings (#83414)

* Adjust indices.recovery.max_bytes_per_sec according to external settings (#82819)

Today the setting indices.recovery.max_bytes_per_sec defaults to different 
values depending on the node roles, the JVM version and the system total 
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a 
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define 
more appropriate defaults if Elasticsearch were to know better the limits 
(or properties) of the hardware it is running on - something that Elasticsearch 
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts. 
When they are set Elasticsearch detects the minimum available bandwidth 
among the network, disk read and disk write available bandwidths and computes 
a maximum bytes per seconds limit that will be a fraction of the min. available 
bandwidth. By default 40% of the min. bandwidth is used but that can be 
dynamically configured by an operator 
(using the node.bandwidth.recovery.operator.factor setting) or by the user 
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing 
limitations like the one set through the indices.recovery.max_bytes_per_sec setting 
or the one that is computed by Elasticsearch from the node's physical memory 
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible 
limit among those values, while not exceeding an overcommit ratio that is also 
defined through a node setting 
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is 
greater than 100 times (by default) the minimum available bandwidth.

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit 
wasn't added to the list of cluster settings and to the list of settings to 
consume for updates.

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Document and test operator-only node bandwidth recovery settings (#83372)

This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in #82819.

It also adds an integration test for those operator only 
settings that would have caught #83359.

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >docs General docs changes Team:Distributed Meta label for distributed team. Team:Docs Meta label for docs team v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants