Skip to content

Respect Master Node Timeout for Delete Snapshot Requests#55798

Merged
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:respect-timeout-snapshot-delete
Apr 27, 2020
Merged

Respect Master Node Timeout for Delete Snapshot Requests#55798
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:respect-timeout-snapshot-delete

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

Respect master node timeout for the first cluster state update task
during a snapshot delete request like we do for snapshot create.

Respect master node timeout for the first cluster state update task
during a snapshot delete request like we do for snapshot create.
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.8.0 labels Apr 27, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the change is good, I think we need to be careful how it affects existing orchestration (SLM and other Cloud snapshot orchestration) which has possibly relied on there being no timeout. Can you double-check that?

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Yannick!

Can you double-check that?

I already and I think this is fine. For one, even though the timeout applies here now, it applied before in transport master node action already (when waiting for a block to clear or master node to get elected).
But SLM specifically won't be broken by this. It handles all exceptions the exact same way. Might even be a good thing that it would give up if the delete won't start within the default 30s instead of waiting forever and adding load at a time when the master is already super busy. (it will get retried on the next snapshot delete cycle)
Cloud orchestration should be fine with this as well, it retries failed deletes via it's periodic checks for snapshots to delete.

@original-brownbear original-brownbear merged commit 182835c into elastic:master Apr 27, 2020
@original-brownbear original-brownbear deleted the respect-timeout-snapshot-delete branch April 27, 2020 13:31
@original-brownbear original-brownbear restored the respect-timeout-snapshot-delete branch August 6, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.8.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants