Limit shard realocation retries#90296
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @idegtiarenko, I've created a changelog YAML for you. |
DaveCTurner
left a comment
There was a problem hiding this comment.
I left some initial comments.
| /** | ||
| * Holds additional information as to why the shard failed to relocate. | ||
| */ | ||
| public class RelocationFailureInfo implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
I wonder, do we need a whole new object for this or could we just use a plain int? We don't really need to distinguish null from 0 I think.
If we really want an object I'd still rather it was never null, and maybe use a record instead?
There was a problem hiding this comment.
Converted to a record. I wanted to keep it a class/record in case we want to make this behavior more complex in the future (similar to UnassignedInfo with recording failed nodes or introducing a delay)
server/src/main/java/org/elasticsearch/cluster/routing/RoutingNodes.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/RelocationFailureInfoTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/RelocationFailureInfoTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/RelocationFailureInfo.java
Outdated
Show resolved
Hide resolved
…ionFailureInfo.java Co-authored-by: David Turner <david.turner@elastic.co>
|
UPD: The failure is caused by:
|
| unassignedInfo.toXContent(builder, params); | ||
| } | ||
| if (relocationFailureInfo != RelocationFailureInfo.NO_FAILURES) { | ||
| relocationFailureInfo.toXContent(builder, params); |
There was a problem hiding this comment.
Do we want to show this unconditionally?
There was a problem hiding this comment.
I think it's friendlier to clients if we don't change the response shape like this, so let's make it unconditional. NB RelocationFailureInfo#toXContent only emits the field if it's nonzero - it'd be better not to have that conditionality too.
There was a problem hiding this comment.
Do we have any clients that fail on unknown fields that we need to notify about the change?
server/src/main/java/org/elasticsearch/cluster/routing/RelocationFailureInfo.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/main/java/org/elasticsearch/cluster/routing/ShardRouting.java
This change ensures that elasticsearch would not indefinitely retry relocating shard if operation fails.
Closes: #79445