Skip to content

SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing backlogged #75675

@easyice

Description

@easyice

if soft deleted docs is very more, and they are also in retention lease, the numDeletesToMerge function have performance issue

for instance,an update indexing is writing to elasticsearch, then we move one a primary shard to an other node,If the moving continues for a long time, the size of old shard will become very big, because soft-deleted operations need to held by retention lease. The more soft-deleted documents, the slower the indexing. if the shard size is about 20GB, we can get the below flamegraph

image

flamegraph.html.zip

In this case, the write queue will be backlog persists, and we can get the jstack below:

1.txt

and the indices stats:

health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex-2021.07.26 wH2C74XtRRaO8O3KBrLu1A   6   0   73873732     19975538     76.3gb         76.3gb

The _cat/shards/ (when relocating is done, the shard size will be reduced to the same size as other shards)

image

In #35594 , a cache add for numDeletesToMerge, i backport this pr, and re-run in my test case, the issue is resolved

@s1monw I think the PR can be reconsidered

my elasticsearch version: 7.6.2 with LUCENE-9228 backport

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed/EngineAnything around managing Lucene and the Translog in an open shard.>bugTeam:DistributedMeta label for distributed team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions