Skip to content

Do not mark bulk indexing requests as retried after primary relocations #141586

@fcofdez

Description

@fcofdez

When a shard relocation occurs, in-flight indexing operations wait to acquire a primary permit until the relocation completes. Upon successful relocation, these operations receive a RetryOnPrimaryException, which marks the request as retried. This forces a version lookup during indexing on the new primary, even for operations that could have used the auto-generated ID append-only optimization.

For shards with high indexing throughput, relocations can take a while to complete. During this time, many operations queue up, all marked as retried. When they finally execute on the new primary, they must perform version lookups (requiring at least a terms dictionary lookup) with a likely cold cache. This can overwhelm the new primary shard.

We should detect this situation and avoid marking the request as retried, since we can safely assume the documents were not indexed on the old primary.

Metadata

Metadata

Assignees

Labels

:Distributed/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.>enhancementTeam:DistributedMeta label for distributed team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions