Skip to content

Operator factor settings should have the OperatorDynamic setting property#83359

Merged
elasticsearchmachine merged 2 commits intoelastic:masterfrom
tlrx:operator-settings
Feb 1, 2022
Merged

Operator factor settings should have the OperatorDynamic setting property#83359
elasticsearchmachine merged 2 commits intoelastic:masterfrom
tlrx:operator-settings

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Feb 1, 2022

Relates #82819

@tlrx tlrx added >non-issue :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.1.0 labels Feb 1, 2022
@tlrx tlrx requested a review from DaveCTurner February 1, 2022 10:59
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Feb 1, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 1, 2022

@elasticmachine run elasticsearch-ci/part-1

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 1, 2022

@elasticmachine update branch

@tlrx tlrx added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 1, 2022
@elasticsearchmachine elasticsearchmachine merged commit 4f1e779 into elastic:master Feb 1, 2022
@tlrx tlrx deleted the operator-settings branch February 1, 2022 13:30
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Feb 1, 2022

Thanks David and Ievgen

tlrx added a commit that referenced this pull request Feb 2, 2022
)

This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in #82819.

It also adds an integration test for those operator only 
settings that would have caught #83359.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 2, 2022
…stic#83372)

This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in elastic#82819.

It also adds an integration test for those operator only 
settings that would have caught elastic#83359.
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…nal settings (#83413)

* Adjust indices.recovery.max_bytes_per_sec according to external settings

Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of #82819 for 7.17.1

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Adjust for 7.17.1

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2022
…al settings (#83414)

* Adjust indices.recovery.max_bytes_per_sec according to external settings (#82819)

Today the setting indices.recovery.max_bytes_per_sec defaults to different 
values depending on the node roles, the JVM version and the system total 
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a 
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define 
more appropriate defaults if Elasticsearch were to know better the limits 
(or properties) of the hardware it is running on - something that Elasticsearch 
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts. 
When they are set Elasticsearch detects the minimum available bandwidth 
among the network, disk read and disk write available bandwidths and computes 
a maximum bytes per seconds limit that will be a fraction of the min. available 
bandwidth. By default 40% of the min. bandwidth is used but that can be 
dynamically configured by an operator 
(using the node.bandwidth.recovery.operator.factor setting) or by the user 
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing 
limitations like the one set through the indices.recovery.max_bytes_per_sec setting 
or the one that is computed by Elasticsearch from the node's physical memory 
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible 
limit among those values, while not exceeding an overcommit ratio that is also 
defined through a node setting 
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is 
greater than 100 times (by default) the minimum available bandwidth.

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit 
wasn't added to the list of cluster settings and to the list of settings to 
consume for updates.

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Document and test operator-only node bandwidth recovery settings (#83372)

This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in #82819.

It also adds an integration test for those operator only 
settings that would have caught #83359.

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue Team:Distributed Meta label for distributed team. v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants