Frozen tier autoscaling decider based on shards by henningandersen · Pull Request #71042 · elastic/elasticsearch

henningandersen · 2021-03-30T09:54:40Z

The frozen tier only holds shared cache searchable snapshots. This
commit adds an autoscaling decider that scales the total memory on
the tier adequately to hold the shards. A frozen shard is assigned
a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards
before scaling further.

The max shards validation limit will be relaxed in a separate PR.

The frozen tier only holds shared cache searchable snapshots. This commit adds an autoscaling decider that scales the total memory on the tier adequately to hold the shards. A frozen shard is assigned a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards before scaling further.

elasticmachine · 2021-03-30T09:54:43Z

Pinging @elastic/es-distributed (Team:Distributed)

dakrone · 2021-03-31T20:54:34Z

...ing/src/main/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderService.java

+        String tierPreference = DataTierAllocationDecider.INDEX_ROUTING_PREFER_SETTING.get(indexSettings);
+        String[] preferredTiers = DataTierAllocationDecider.parseTierList(tierPreference);
+        if (preferredTiers.length >= 1 && preferredTiers[0].equals(DataTier.DATA_FROZEN)) {
+            // todo: add this line once frozen is only mounted using DATA_FROZEN


I think this can be updated to be == 1 now that #71014 has been merged

Thanks, I added the assertion in f96070c

…scaling

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042

The reactive decider no longer applies to the frozen tier, since it could grossly over-estimate the amount of storage for unassigned frozen shards. Relates elastic#71042

dakrone

LGTM, I left a few really minor comments. I wouldn't say I'm an expert in the autoscaling stuff (far from it!), but this looks good to me.

dakrone · 2021-04-12T15:36:48Z

.../org/elasticsearch/xpack/autoscaling/shards/LocalStateAutoscalingAndSearchableSnapshots.java

+import org.elasticsearch.xpack.autoscaling.LocalStateAutoscaling;
+import org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots;
+
+public class LocalStateAutoscalingAndSearchableSnapshots extends LocalStateAutoscaling {


For my own edification and/or someone else reading this code, can you add a comment about why this plugin wrapper is required?

I added a comment in 1c4c4ae

No expert on this and any conventions. I think adding a LocalStateSearchablesnapshots here and using both LocalState plugins from the test could also work, but I found this nicer.

...ternalClusterTest/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderIT.java

dakrone · 2021-04-12T15:44:54Z

...ing/src/main/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderService.java

+    public static final Setting<ByteSizeValue> MEMORY_PER_SHARD = Setting.byteSizeSetting(
+        "memory_per_shard",
+        (dummy) -> DEFAULT_MEMORY_PER_SHARD.getStringRep()
+    );


Should this validate that memory_per_shard is not negative?

++, done in 8eabd20

dakrone · 2021-04-12T15:48:50Z

...ing/src/main/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderService.java

+        private final long shards;
+
+        public FrozenShardsReason(long shards) {
+            this.shards = shards;


Should we validate this is non-negative, and then use (read|write)Vlong() in serialization? It's not a huge deal, just curious whether we expect this to ever be negative.

++, done in 8f975a0

…scaling

jasontedor

I left a couple of comments.

jasontedor · 2021-04-13T01:05:17Z

...ing/src/main/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderService.java

+    static final ByteSizeValue DEFAULT_MEMORY_PER_SHARD = ByteSizeValue.ofBytes(MAX_MEMORY.getBytes() / 2000);
+    public static final Setting<ByteSizeValue> MEMORY_PER_SHARD = Setting.byteSizeSetting(
+        "memory_per_shard",
+        (dummy) -> DEFAULT_MEMORY_PER_SHARD.getStringRep(),


Some style guides would recommend avoiding the use of terms such as this. How about ignored or unused?

++, fixed in 1596b6d

jasontedor · 2021-04-13T01:14:30Z

...ing/src/main/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderService.java

+    static long countFrozenShards(Metadata metadata) {
+        return StreamSupport.stream(metadata.spliterator(), false)
+            .filter(imd -> isFrozenIndex(imd.getSettings()))
+            .mapToLong(IndexMetadata::getTotalNumberOfShards)


I’m curious why mapToLong instead of mapToInt here, and IntStream#sum returning an int then?

future proofing 🙂, avoiding to think about (int*long) and thinking we only support 64bit so it does not matter. I changed it to an int in 6c5ecde since this seems to be the "convention" for such a count.

jasontedor

LGTM.

henningandersen · 2021-04-13T12:34:07Z

Thanks Lee and Jason!

The frozen tier only holds shared cache searchable snapshots. This commit adds an autoscaling decider that scales the total memory on the tier adequately to hold the shards. A frozen shard is assigned a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards before scaling further.

Added documentation for the frozen shards decider. Relates #71042

Added documentation for the frozen shards decider. Relates elastic#71042

Added documentation for the frozen shards decider. Relates #71042

The reactive decider no longer applies to the frozen tier, since it could grossly over-estimate the amount of storage for unassigned frozen shards. Relates #71042

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates #71042 and #34021

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042 and elastic#34021

Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates #71042 and #34021 Includes #71781 and #71777

henningandersen added >enhancement v8.0.0 :Distributed/Autoscaling Automatically adding or removing nodes in a cluster v7.13.0 labels Mar 30, 2021

elasticmachine added the Team:Distributed Meta label for distributed team. label Mar 30, 2021

henningandersen requested a review from jasontedor March 30, 2021 11:29

dakrone reviewed Mar 31, 2021

View reviewed changes

henningandersen added 5 commits April 6, 2021 09:10

Merge remote-tracking branch 'origin/master' into enhance_frozen_auto…

c99fe00

…scaling

Compile after merge

6c92459

Add single tier preference assertion.

f96070c

Fix tests after merge

9035c54

spotless

3847580

henningandersen mentioned this pull request Apr 7, 2021

Introduce separate shard limit for frozen shards #71392

Merged

henningandersen requested a review from dakrone April 11, 2021 06:59

henningandersen mentioned this pull request Apr 11, 2021

Frozen autoscaling should not use local storage #71541

Merged

dakrone approved these changes Apr 12, 2021

View reviewed changes

henningandersen added 6 commits April 12, 2021 19:42

Add comment on local state combined plugin

1c4c4ae

Use constant.

93bcb96

Validate min size.

8eabd20

Positive shard count

8f975a0

spotless

2371cdc

Merge remote-tracking branch 'origin/master' into enhance_frozen_auto…

095b45a

…scaling

jasontedor reviewed Apr 13, 2021

View reviewed changes

henningandersen added 2 commits April 13, 2021 10:01

Improved style

1596b6d

int is enough.

6c5ecde

henningandersen mentioned this pull request Apr 13, 2021

Autoscaling frozen shards docs #71583

Merged

henningandersen requested a review from jasontedor April 13, 2021 11:20

jasontedor approved these changes Apr 13, 2021

View reviewed changes

henningandersen merged commit adf1172 into elastic:master Apr 13, 2021

henningandersen added the backport pending label Apr 13, 2021

henningandersen mentioned this pull request Apr 13, 2021

Frozen tier autoscaling decider based on shards (#71042) #71627

Merged

henningandersen added a commit that referenced this pull request Apr 15, 2021

Autoscaling frozen shards docs (#71583)

a0d1c5b

Added documentation for the frozen shards decider. Relates #71042

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Apr 15, 2021

Autoscaling frozen shards docs (elastic#71583)

3c8de0d

Added documentation for the frozen shards decider. Relates elastic#71042

henningandersen mentioned this pull request Apr 15, 2021

Autoscaling frozen shards docs (#71583) #71747

Merged

henningandersen added a commit that referenced this pull request Apr 15, 2021

Autoscaling frozen shards docs (#71583) (#71747)

f51e5fa

Added documentation for the frozen shards decider. Relates #71042

henningandersen removed the backport pending label Apr 15, 2021

henningandersen mentioned this pull request Apr 15, 2021

Introduce separate shard limit for frozen shards (#71392) #71760

Merged

henningandersen mentioned this pull request Jun 1, 2021

Add fleet search api to wait on refreshes #73134

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

henningandersen commented Mar 30, 2021

Uh oh!

elasticmachine commented Mar 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Apr 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

henningandersen Apr 12, 2021 •

edited

Loading