Add cluster-wide shard limit#32856
Conversation
Adds a safety limit on the number of shards in a cluster, based on the number of nodes in the cluster. The limit is checked on operations that add (or activate) shards, such as index creation, snapshot restoration, and opening closed indices, and can be changed via the cluster settings API. Closes elastic#20705
Based on review feedback. Either can be used to set the per-node shard limit, so let's verify both.
During cluster startup, a cluster may consist only of master, non-data nodes. In this case, we want to allow the user to configure the cluster until the data nodes come online.
| } | ||
|
|
||
| public static final Setting<Integer> SETTING_CLUSTER_MAX_SHARDS_PER_NODE = | ||
| Setting.intSetting("cluster.shards.max_per_node", 1000, Property.Dynamic, Property.NodeScope); |
There was a problem hiding this comment.
Can we set a minimum for this setting of 1 shard per node? (so that people don't set it to -171 and expect weird things)
There was a problem hiding this comment.
I wonder if a higher minimum is warranted -- e.g., to ensure if we are setting up a new cluster we can create a .kibana index and so on?
There was a problem hiding this comment.
The default here is a 1000 so we will be fine out of the box. The question here is the minimum and there is not a good value as there are many indices that might be created (.kibana, .security, Watcher, .monitoring, etc.). It is too hard to find the right minimum to ensure the basics of our stack function and to keep this value properly maintained. If someone really does want to set the value to one shard per node, I think we should permit that.
There was a problem hiding this comment.
Added a minimum of 1 - I agree with Jason, I think trying to figure out any other minimum would be very complicated.
Based on review comments in elastic#32856
jasontedor
left a comment
There was a problem hiding this comment.
This looks like a good start. I think the implementation here misses a critical case which is updating index settings to increase the number of replicas. For example, I think the following would be permitted with the current implementation:
- set the limit to 1 shard per node
- start two nodes
- create an index
iand an indexjwithindex.number_of_replicasset to zero, and the default number of shards - now, creating a third index will be blocked by the max shards per node limit 🎉
- however, a settings update on
iandjto increase theindex.number_of_replicasto one would be permitted, yet this would put the cluster over the limit 😢
Per discussion on elastic#32856, the cluster-wide shard limit is now enforced when changing the number of replicas used by an index.
|
Jason makes an excellent point - I simply forgot about that case. I've added code to handle changing the replica settings, as well as several test cases. Additionally, following the rule of three, I've factored some shared logic out into a shared method. |
It appears that ActionRequestValidationException tends to be used for more client-related purposes, and ValidationException is more appropriate here.
|
@elasticmachine retest this please |
|
This might already be planned but I think we might want to add some kind of deprecation warning to 6.x to explain to the user that this breaking change is coming in 7.0 if they are over |
The subclasses have been removed, this cleans up a couple remaining instances of the subclasses in the tests.
|
Per discussion with @jasontedor, I'm closing this PR, pending a new PR which implements deprecation warnings for clusters with shard counts above the default limit. This will be followed shortly by opt-in enforcement (to be backported to 6.x) and enforcement by default in 7.0+. |
Adds a safety limit on the number of shards in a cluster, based on
the number of nodes in the cluster. The limit is checked on operations
that add (or activate) shards, such as index creation, snapshot
restoration, and opening closed indices, and can be changed via the
cluster settings API.
Closes #20705