Aborting a Snapshot Queued after a Finalizing Snapshot is Broken

There is a bug in the concurrent snapshot logic where the following situation involving three concurrent snapshots and a snapshot delete is broken and may lead to writing corrupted repository metadata:

1. Start 3 snapshots for the same two indices
2. Abort the one in the middle before after the first snapshot finishes on the data node (as far as writing to the repository goes) but before the index gets out of the queued state for the second snapshot
3. third snapshot is moved started once the middling snapshot completes to `FAILED` state but has `null` for the shard generation for shards in the shared index

This is a fairly unlikely scenario to run into since the abort must be timed just right, but it's somewhat more likely if the second snapshot has a larger diff with the first snapshot (so writing the files takes longer ... though even in this scenario the CS with hte abort has to be applied on the data node right after finishing the last file).
=> fixing this asap but probably not before next week

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Aborting a Snapshot Queued after a Finalizing Snapshot is Broken #75598

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions