Skip to content

Add missing max overcommit factor to list of (dynamic) settings#83350

Merged
tlrx merged 3 commits intoelastic:masterfrom
tlrx:overcommit
Feb 1, 2022
Merged

Add missing max overcommit factor to list of (dynamic) settings#83350
tlrx merged 3 commits intoelastic:masterfrom
tlrx:overcommit

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Feb 1, 2022

The setting node.bandwidth.recovery.operator.factor.max_overcommit wasn't added to the list of cluster settings and to the list of settings to consume for updates.

Relates #82819

@tlrx tlrx added >bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.1.0 labels Feb 1, 2022
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Feb 1, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx requested a review from henningandersen February 1, 2022 08:53
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I think we should mark this non-issue since it was never released.

@tlrx tlrx added >non-issue and removed >bug labels Feb 1, 2022
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 1, 2022

@elasticmachine update branch

@tlrx tlrx merged commit 086c6e8 into elastic:master Feb 1, 2022
@tlrx tlrx deleted the overcommit branch February 1, 2022 10:59
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 1, 2022

Thanks Henning

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
…tic#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates elastic#82819
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
…tic#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit 
wasn't added to the list of cluster settings and to the list of settings to 
consume for updates.

Relates elastic#82819
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…nal settings (#83413)

* Adjust indices.recovery.max_bytes_per_sec according to external settings

Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of #82819 for 7.17.1

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Adjust for 7.17.1

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…al settings (#83414)

* Adjust indices.recovery.max_bytes_per_sec according to external settings (#82819)

Today the setting indices.recovery.max_bytes_per_sec defaults to different 
values depending on the node roles, the JVM version and the system total 
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a 
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define 
more appropriate defaults if Elasticsearch were to know better the limits 
(or properties) of the hardware it is running on - something that Elasticsearch 
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts. 
When they are set Elasticsearch detects the minimum available bandwidth 
among the network, disk read and disk write available bandwidths and computes 
a maximum bytes per seconds limit that will be a fraction of the min. available 
bandwidth. By default 40% of the min. bandwidth is used but that can be 
dynamically configured by an operator 
(using the node.bandwidth.recovery.operator.factor setting) or by the user 
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing 
limitations like the one set through the indices.recovery.max_bytes_per_sec setting 
or the one that is computed by Elasticsearch from the node's physical memory 
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible 
limit among those values, while not exceeding an overcommit ratio that is also 
defined through a node setting 
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is 
greater than 100 times (by default) the minimum available bandwidth.

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit 
wasn't added to the list of cluster settings and to the list of settings to 
consume for updates.

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Document and test operator-only node bandwidth recovery settings (#83372)

This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in #82819.

It also adds an integration test for those operator only 
settings that would have caught #83359.

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue Team:Distributed Meta label for distributed team. v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants