Autoscaling reactive storage decider#65520

Merged

henningandersen merged 40 commits intoelastic:masterfrom

henningandersen:enhance_reactive_storage_autoscaler_pr_final

Dec 13, 2020

Contributor

henningandersen commented Nov 25, 2020

The reactive storage decider will request additional capacity
proportional to the size of shards that are either:

unassigned and unable to be allocated with only reason being storage
on a node
shards that cannot remain where they are with only reason being
storage and cannot be allocated anywhere else
shards that cannot remain where they are and cannot be allocated
on any node and at least one node has storage as the only reason for
unable to allocate.

The reactive storage decider does not try to look into the future, thus
at the time the reactive decider asks to scale up, the cluster is
already in a need for more storage.


          Autoscaling reactive storage decider

a5e5f43

The reactive storage decider will request additional capacity
proportional to the size of shards that are either:
* unassigned and unable to be allocated with only reason being storage
on a node
* shards that cannot remain where they are with only reason being
storage and cannot be allocated anywhere else
* shards that cannot remain where they are and cannot be allocated
on any node and at least one node has storage as the only reason for
unable to allocate.

The reactive storage decider does not try to look into the future, thus
at the time the reactive decider asks to scale up, the cluster is
already in a need for more storage.

henningandersen added >non-issue v8.0.0 :Distributed/Autoscaling v7.11.0 labels

henningandersen added 2 commits

November 26, 2020 12:02


          Extract DiskUsageIntegTestCase

bf2d27e

Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow
other tests to easily test functionality relying on disk usage.

Relates elastic#65520


          Added integration test.

7fadd0b

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request


          Extract DiskUsageIntegTestCase

1d0cd90

Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow
other tests to easily test functionality relying on disk usage.

Relates elastic#65520

henningandersen mentioned this pull request

Extract DiskUsageIntegTestCase #65540

Merged

henningandersen added 5 commits

November 26, 2020 13:16


          Added node minimum size too.

9f6a430


          Spotless.

3083ff0


          whitespace

d06378e


          cleanup:

2c78a0e

allocationDeciders are now given to service at construction time.
Few test fixes.


          cleanup:

e36f6e8

remove context.roles()
fix unmovable test.

henningandersen commented

View reviewed changes

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java Outdated Show resolved Hide resolved

henningandersen commented

View reviewed changes

test/framework/src/main/java/org/elasticsearch/cluster/DiskUsageIntegTestCase.java Show resolved Hide resolved

henningandersen added 7 commits

November 27, 2020 14:57


          cleanup:

ca90d2c

Use roles rather than node name in test.
Remove unused logger.


          Remove AllocationState.state()

55e96e3


          Make AllocationState package private.

c3f17bb


          spotless

d72b70a


          No copyShards

0567d0b


          spelling

ea21989


          ClusterInfo.EMPTY

2a85928

This was referenced Nov 28, 2020

Autoscaling reactive storage decider #55460

Closed

Autoscaling reactive storage decider #65059

Closed

henningandersen added 5 commits

November 28, 2020 18:39


          simplify nodesInTier


          spotless

8aa29ee


          Remove bad comment

9cade4f


          nodes are unnecessary for snapshot

1a9fcc2


          Add assert message to explain

592117e

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request


          Extract DiskUsageIntegTestCase (elastic#65540)

d35064e

Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow
other tests to easily test functionality relying on disk usage.

Relates elastic#65520

henningandersen mentioned this pull request

Extract DiskUsageIntegTestCase (#65540) #65868

Merged


          Merge remote-tracking branch 'origin/master' into enhance_reactive_st…

5738ed3

…orage_autoscaler_pr_final

henningandersen added a commit that referenced this pull request


          Extract DiskUsageIntegTestCase (#65540) (#65868)

3f0bd64

Extracted DiskUsageIntegTestCase from DiskThresholdDeciderIT to allow
other tests to easily test functionality relying on disk usage.

Relates #65520


          Wire serialization tests for reactive

381c463

henningandersen mentioned this pull request

Autoscaling proactive storage decider #65933

Merged

Contributor Author

henningandersen commented Dec 7, 2020

@elasticmachine update branch

elasticmachine and others added 3 commits

December 7, 2020 10:21


          Merge branch 'master' into enhance_reactive_storage_autoscaler_pr_final

fa0f2b7


          Merge remote-tracking branch 'origin/master' into enhance_reactive_st…

5ec4406

…orage_autoscaler_pr_final


          Merge branch 'enhance_reactive_storage_autoscaler_pr_final' of github…

35a937e

….com:henningandersen/elasticsearch into enhance_reactive_storage_autoscaler_pr_final

Contributor Author

henningandersen commented Dec 10, 2020

@elasticmachine update branch


          Merge remote-tracking branch 'origin/master' into enhance_reactive_st…

f28fb9a

…orage_autoscaler_pr_final

jasontedor approved these changes

View reviewed changes

Member

jasontedor left a comment

This is really great work. I left a few minor comments, but no need for another round.

...ng/src/main/java/org/elasticsearch/xpack/autoscaling/capacity/AutoscalingDeciderContext.java


		ClusterInfo info();

		SnapshotShardSizeInfo snapshotShardSizeInfo();

Member

jasontedor Dec 13, 2020

Could you add Javadocs to these new methods, and also state?

Contributor Author

henningandersen Dec 13, 2020

👍, 22456be

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java

+                          DataTier.DATA_CONTENT_NODE_ROLE,
+                          DataTier.DATA_HOT_NODE_ROLE,
+                          DataTier.DATA_WARM_NODE_ROLE,
+                          DataTier.DATA_COLD_NODE_ROLE

Member

jasontedor Dec 13, 2020

We'll probably want a test that collects all the roles that return true for DiscoveryNodeRole#canContainData and ensure they are returned in this list. I'm thinking of when we add a role for frozen, ensuring that this list is maintained properly.

Contributor Author

henningandersen Dec 13, 2020

👍, 555991a

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java

+                  }
+                  static boolean isDiskOnlyNoDecision(Decision decision) {
+                      // we consider throttling==yes, throttling should be temporary.

Member

jasontedor Dec 13, 2020

👍

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java

+                       */
+                      private boolean cannotAllocateDueToStorage(ShardRouting shard, RoutingAllocation allocation) {
+                          assert allocation.debugDecision() == false;
+                          allocation.debugDecision(true);

Member

jasontedor Dec 13, 2020

Can you leave a comment explaining why we need to enable allocation debugging here?

Contributor Author

henningandersen Dec 13, 2020

👍, 6a9c5cb

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java Show resolved Hide resolved

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java Outdated Show resolved Hide resolved

...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java Outdated

+                      assert assigned >= 0;
+                      assert unassigned >= 0;
+                      assert maxShard >= 0;
+                      String message = unassigned > 0 || assigned > 0 ? "not enough storage available, needs " + (unassigned + assigned) : "storage ok";

Member

jasontedor Dec 13, 2020

I wonder if this should be human readable bytes? So new ByteSizeValue(unassigned + assigned).toString()?

Member

jasontedor Dec 13, 2020

And if not, bytes should be appended to the message.

Contributor Author

henningandersen Dec 13, 2020

👍, b58d294

henningandersen added 5 commits

December 13, 2020 16:05


          Javadoc on AutoscalingDeciderContext

22456be


          Test roles().

555991a


          comment why we need debug decisions.

6a9c5cb


          Include byte unit in base message.

b58d294


          Merge remote-tracking branch 'origin/master' into enhance_reactive_st…

7b5f370

…orage_autoscaler_pr_final

henningandersen merged commit 5e20c0a into elastic:master

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request


          Merge remote-tracking branch 'elastic/master' into license-to-autoscale

6ea54e0

* elastic/master:
  Autoscaling reactive storage decider (elastic#65520)
  Fix TranslogTests#testStats (elastic#66227)

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request


          Autoscaling reactive storage decider (elastic#65520)

c5929df

The reactive storage decider will request additional capacity
proportional to the size of shards that are either:
* unassigned and unable to be allocated with only reason being storage
on a node
* shards that cannot remain where they are with only reason being
storage and cannot be allocated anywhere else
* shards that cannot remain where they are and cannot be allocated
on any node and at least one node has storage as the only reason for
unable to allocate.

The reactive storage decider does not try to look into the future, thus
at the time the reactive decider asks to scale up, the cluster is
already in a need for more storage.

henningandersen mentioned this pull request

Autoscaling reactive storage decider (#65520) #66236

Merged

henningandersen added a commit that referenced this pull request


          Autoscaling reactive storage decider (#65520) (#66236)

2a44a5f

The reactive storage decider will request additional capacity
proportional to the size of shards that are either:
* unassigned and unable to be allocated with only reason being storage
on a node
* shards that cannot remain where they are with only reason being
storage and cannot be allocated anywhere else
* shards that cannot remain where they are and cannot be allocated
on any node and at least one node has storage as the only reason for
unable to allocate.

The reactive storage decider does not try to look into the future, thus
at the time the reactive decider asks to scale up, the cluster is
already in a need for more storage.

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Autoscaling >non-issue Team:Distributed v7.11.0 v8.0.0-alpha1