Manage retention of failed snapshots in SLM#47617
Manage retention of failed snapshots in SLM#47617AthenaEryma merged 9 commits intoelastic:masterfrom
Conversation
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
|
Pinging @elastic/es-core-features (:Core/Features/ILM) |
|
I tried to write an integration test for this, but had a heck of a time getting a failed snapshot that actually ended up in the repository. I'll look a bit more to see if there's a way I'm missing. |
dakrone
left a comment
There was a problem hiding this comment.
Thanks for working on this Gordon, I left some comments. In terms of an integration test, I think you might be able to use MockRepository to "block" or cause errors during the snapshot the way that SLMSnapshotBlockingIntegTests does?
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
|
I've pushed most of the changes but would still like to get an integration test in - I believe I've found a way - so you can hold off on re-reviewing until I get that in. |
dakrone
left a comment
There was a problem hiding this comment.
LGTM, thanks for splitting some of the logic, it was easier to follow this time around.
|
|
||
| logger.info("--> start snapshot"); | ||
| ActionFuture<ExecuteSnapshotLifecycleAction.Response> snapshotFuture = client() | ||
| .execute(ExecuteSnapshotLifecycleAction.INSTANCE, new ExecuteSnapshotLifecycleAction.Request(policyId)); |
There was a problem hiding this comment.
There's an executePolicy helper that returns the snapshot name as a String (for future tests)
|
@elasticmachine run elasticsearch-ci/1 |
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
Failed snapshots will eventually build up unless they are deleted. While
failures may not take up much space, they add noise to the list of
snapshots and it's desirable to remove them when they are no longer
useful.
With this change, failed snapshots are deleted using the following
strategy:
FAILEDsnapshots will be kept until the configuredexpire_afterperiod has passed, if present, and then be deleted. Ifthere is no configured
expire_afterin the retention policy, then theywill be deleted if there is at least one more recent successful snapshot
from this policy (as they may otherwise be useful for troubleshooting
purposes). Failed snapshots are not counted towards either
min_countor
max_count.Implements part of #46988
Labelled
non-issuebecause this feature hasn't yet shipped.