Autoscaling reactive storage decider#65520
Conversation
The reactive storage decider will request additional capacity proportional to the size of shards that are either: * unassigned and unable to be allocated with only reason being storage on a node * shards that cannot remain where they are with only reason being storage and cannot be allocated anywhere else * shards that cannot remain where they are and cannot be allocated on any node and at least one node has storage as the only reason for unable to allocate. The reactive storage decider does not try to look into the future, thus at the time the reactive decider asks to scale up, the cluster is already in a need for more storage.
Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow other tests to easily test functionality relying on disk usage. Relates elastic#65520
Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow other tests to easily test functionality relying on disk usage. Relates elastic#65520
...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/cluster/DiskUsageIntegTestCase.java
Show resolved
Hide resolved
Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow other tests to easily test functionality relying on disk usage. Relates elastic#65520
…orage_autoscaler_pr_final
|
@elasticmachine update branch |
…orage_autoscaler_pr_final
….com:henningandersen/elasticsearch into enhance_reactive_storage_autoscaler_pr_final
|
@elasticmachine update branch |
…orage_autoscaler_pr_final
jasontedor
left a comment
There was a problem hiding this comment.
This is really great work. I left a few minor comments, but no need for another round.
|
|
||
| ClusterInfo info(); | ||
|
|
||
| SnapshotShardSizeInfo snapshotShardSizeInfo(); |
There was a problem hiding this comment.
Could you add Javadocs to these new methods, and also state?
| DataTier.DATA_CONTENT_NODE_ROLE, | ||
| DataTier.DATA_HOT_NODE_ROLE, | ||
| DataTier.DATA_WARM_NODE_ROLE, | ||
| DataTier.DATA_COLD_NODE_ROLE |
There was a problem hiding this comment.
We'll probably want a test that collects all the roles that return true for DiscoveryNodeRole#canContainData and ensure they are returned in this list. I'm thinking of when we add a role for frozen, ensuring that this list is maintained properly.
| } | ||
|
|
||
| static boolean isDiskOnlyNoDecision(Decision decision) { | ||
| // we consider throttling==yes, throttling should be temporary. |
| */ | ||
| private boolean cannotAllocateDueToStorage(ShardRouting shard, RoutingAllocation allocation) { | ||
| assert allocation.debugDecision() == false; | ||
| allocation.debugDecision(true); |
There was a problem hiding this comment.
Can you leave a comment explaining why we need to enable allocation debugging here?
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Outdated
Show resolved
Hide resolved
| assert assigned >= 0; | ||
| assert unassigned >= 0; | ||
| assert maxShard >= 0; | ||
| String message = unassigned > 0 || assigned > 0 ? "not enough storage available, needs " + (unassigned + assigned) : "storage ok"; |
There was a problem hiding this comment.
I wonder if this should be human readable bytes? So new ByteSizeValue(unassigned + assigned).toString()?
There was a problem hiding this comment.
And if not, bytes should be appended to the message.
* elastic/master: Autoscaling reactive storage decider (elastic#65520) Fix TranslogTests#testStats (elastic#66227)
The reactive storage decider will request additional capacity proportional to the size of shards that are either: * unassigned and unable to be allocated with only reason being storage on a node * shards that cannot remain where they are with only reason being storage and cannot be allocated anywhere else * shards that cannot remain where they are and cannot be allocated on any node and at least one node has storage as the only reason for unable to allocate. The reactive storage decider does not try to look into the future, thus at the time the reactive decider asks to scale up, the cluster is already in a need for more storage.
The reactive storage decider will request additional capacity proportional to the size of shards that are either: * unassigned and unable to be allocated with only reason being storage on a node * shards that cannot remain where they are with only reason being storage and cannot be allocated anywhere else * shards that cannot remain where they are and cannot be allocated on any node and at least one node has storage as the only reason for unable to allocate. The reactive storage decider does not try to look into the future, thus at the time the reactive decider asks to scale up, the cluster is already in a need for more storage.
The reactive storage decider will request additional capacity
proportional to the size of shards that are either:
on a node
storage and cannot be allocated anywhere else
on any node and at least one node has storage as the only reason for
unable to allocate.
The reactive storage decider does not try to look into the future, thus
at the time the reactive decider asks to scale up, the cluster is
already in a need for more storage.