Skip to content

[8.11] WaitForSnapshotStep verifies if the index belongs to the latest snapshot of that SLM policy (#100911)#101027

Merged
elasticsearchmachine merged 1 commit intoelastic:8.11from
gmarouli:backport/8.11/pr-100911
Oct 18, 2023
Merged

[8.11] WaitForSnapshotStep verifies if the index belongs to the latest snapshot of that SLM policy (#100911)#101027
elasticsearchmachine merged 1 commit intoelastic:8.11from
gmarouli:backport/8.11/pr-100911

Conversation

@gmarouli
Copy link
Copy Markdown
Contributor

Backports the following commits to 8.11:

…pshot of that SLM policy (elastic#100911)

The `WaitForSnapshotStep` used to check if the SLM policy has been
executed after the index has entered the delete phase, but it did not
check if the SLM policy included this index.

The result of this is that if the user used an SLM policy that did not
include this index, when the index would enter the
`WaitForSnapshotStep`, it would wait for a snapshot to be taken, a
snapshot that would not include the index, and then ILM would delete the
index.

See the exact reproduction path:
elastic#57809

**Solution** This PR, after it finds a successful SLM run, it verifies
if the snapshot taken by SLM contains this index. If not it throws an
error, otherwise it proceeds.

ILM explain will report:

```
"step_info": {
        "type": "illegal_state_exception",
        "reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'"
      }
```

**Backwards compatibility concerns** In this PR, the
`WaitForSnapshotStep` changed from `ClusterStateWaitStep` to
`AsyncWaitStep`. We do not think this is gonna cause an issue. This was
tested manually by the following steps: - Run a master node with the old
version. - When ILM is executing `wait-for-snapshot`, we shutdown the
node - We start the node again with the new version os ES - ES was able
to pick up the step and continue with the new code.

We believe that this covers bwc concerns.

Fixes: elastic#57809
@gmarouli gmarouli added :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:Data Management (obsolete) DO NOT USE. This team no longer exists. labels Oct 18, 2023
@elasticsearchmachine elasticsearchmachine merged commit beddf45 into elastic:8.11 Oct 18, 2023
@gmarouli gmarouli deleted the backport/8.11/pr-100911 branch October 18, 2023 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.11.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants