TRA waits when an index doesn't exist but fails immediately when shard is not found

TransportReplicationAction currently has an inconsistency in how it deals with requests that refer to things that don't exist (which is different than not available).

1) When an index is not found in the cluster state, we go into a retry loop where we wait for the index to appear.
2) When a request comes in for a shard that doesn't exists (i.e., the shard id is higher than the number of shards ) we fail immediately - as it will never appear.

This is surprising and we should fix it.

In my opinion we should:
1) Require ReplicationRequests to have a complete ShardId when they get to the reroute phase in TRA.
2) Fail immediately when that shard id can not be resolved.
3) Change TransportIndexAction and similar write actions to resolve the incoming requests and set their proper shard id (with index uuid). If they need to create the index, they can go ahead, but then it's up to them to also wait until the current (data) node, knows about the index that was just created. We can have a shared utility method for this on `AutoCreateIndex`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRA waits when an index doesn't exist but fails immediately when shard is not found #20279

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TRA waits when an index doesn't exist but fails immediately when shard is not found #20279

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions