Do not mark bulk indexing requests as retried after primary relocations

When a shard relocation occurs, in-flight indexing operations wait to acquire a primary permit until the relocation completes. Upon successful relocation, these operations receive a [`RetryOnPrimaryException`](https://github.com/elastic/elasticsearch/blob/c3f07ad1ec13ba9c688adbea9c094563cfd69bff/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L484-L494), which [marks the request as retried](https://github.com/elastic/elasticsearch/blob/c3f07ad1ec13ba9c688adbea9c094563cfd69bff/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L1088-L1096). This [forces a version lookup](https://github.com/elastic/elasticsearch/blob/184c51b9edcf82fada430492acd28f4e336a0364/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L1516-L1519) during indexing on the new primary, even for operations that could have used the auto-generated ID append-only optimization.

For shards with high indexing throughput, relocations can take a while to complete. During this time, many operations queue up, all marked as retried. When they finally execute on the new primary, they must perform version lookups (requiring at least a terms dictionary lookup) with a likely cold cache. This can overwhelm the new primary shard.

We should detect this situation and avoid marking the request as retried, since we can safely assume the documents were not indexed on the old primary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not mark bulk indexing requests as retried after primary relocations #141586

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Do not mark bulk indexing requests as retried after primary relocations #141586

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions