Skip to content

Downsample retry issue #93580

@martijnvg

Description

@martijnvg

Currently when the ilm rollup step is being retried the same target index is being used. This can cause the subsequent downsample api invocation to index rolled up data into shards of the target index that already exists and while the previous downsample api invocation is still partially running (and also rolling up data into the same target shard).

Note that, the rollup step may fail in case a cluster is being restarted in a rolling manner (for example for an upgrade) or when the elected master node fails (the downsample action is coordinated from the elected master node).

The fix would be that the RollupStep isn't retried on its own incase it fails to execute, but that the step that generates the target index name for downsampling is retried first and then the rollup step.

Metadata

Metadata

Assignees

Labels

:Data Management/ILM+SLMDO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead.:StorageEngine/RollupTurn fine-grained time-based data into coarser-grained data:StorageEngine/TSDBYou know, for Metrics>bugTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Team:Data Management (obsolete)DO NOT USE. This team no longer exists.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions