Skip to content

Move experimental frozen to frozen shard limit#71781

Merged
henningandersen merged 10 commits intoelastic:masterfrom
henningandersen:fix_upgrade_shard_limit_pr2
Apr 20, 2021
Merged

Move experimental frozen to frozen shard limit#71781
henningandersen merged 10 commits intoelastic:masterfrom
henningandersen:fix_upgrade_shard_limit_pr2

Conversation

@henningandersen
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen commented Apr 16, 2021

Frozen indices created on 7.12 would not belong to the frozen shard
limit group, now we convert them when last node is upgraded.

Relates #71392

This PR currently includes #71777, but that will be merged to master before this goes in. (done)

Additionally, this PR has a few outstanding version related topics (todos), pending merge of #71760.

If master ends up on a newer version than other cluster members, we
cannot apply the new index setting for shard limits. We skip doing so
for now.
Frozen indices created on 7.12 would not belong to the frozen shard
limit group, now we convert them when last node is upgraded.

Relates elastic#71392
@henningandersen henningandersen added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.13.0 labels Apr 16, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Apr 16, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Comment on lines +154 to +160
.put(DiskThresholdDecider.SETTING_IGNORE_DISK_WATERMARKS.getKey(), true);

// we cannot apply this setting during rolling upgrade.
// todo: update version after backport
if (minNodeVersion.onOrAfter(Version.V_8_0_0)) {
settings.put(ShardLimitValidator.INDEX_SETTING_SHARD_LIMIT_GROUP.getKey(), ShardLimitValidator.FROZEN_GROUP);
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in this file are contained in #71777

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#71777 has now been merged and integrated here.

@henningandersen
Copy link
Copy Markdown
Contributor Author

Failure fixed by #71778

@henningandersen
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/2

henningandersen added a commit that referenced this pull request Apr 17, 2021
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.

Relates #71042 and #34021

Includes #71781 and #71777
@henningandersen
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

Copy link
Copy Markdown
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but I don’t see a need for another round.

return ClusterState.builder(currentState).metadata(builder).build();
}

private static boolean wrongShardLimitGroup(org.elasticsearch.common.settings.Settings settings) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit: how about notFrozenShardLimitGroup?


@Override
public void onFailure(String source, Exception e) {
logger.warn("upgrading frozen indices to have frozen shard limit group failed", e);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add , will retry on the next cluster state update?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, we are letting the user know that something happened, at the warn level. Then they’re going to wonder what do they do? So let’s let them know, you don’t need to take any action, we’ll do it for you. Then, this is a purely informational error message from their perspective.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, fixed in 01c1b3e

}

public void initialize() {
clusterService.addListener(this::clusterChanged);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this listener after the upgrade succeeds?

.put(indexMetadata.getSettings())
.put(ShardLimitValidator.INDEX_SETTING_SHARD_LIMIT_GROUP.getKey(), ShardLimitValidator.FROZEN_GROUP)
)
.settingsVersion(indexMetadata.getSettingsVersion() + 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😉

@henningandersen
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@henningandersen
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@henningandersen henningandersen merged commit 3c7c0c6 into elastic:master Apr 20, 2021
henningandersen added a commit that referenced this pull request Apr 20, 2021
Frozen indices created on 7.12 would not belong to the frozen shard
limit group, now we convert them when last node is upgraded.

Relates #71392
henningandersen added a commit that referenced this pull request Apr 20, 2021
The versions to upgrade erroneously included until 8.0, now set to 7.13.

Relates #71781
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed Meta label for distributed team. v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants