Enhance SnapshotResiliencyTests by original-brownbear · Pull Request #49514 · elastic/elasticsearch

original-brownbear · 2019-11-23T19:37:25Z

A few enhancements to SnapshotResiliencyTests:

Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by Use ClusterState as Consistency Source for Snapshot Repositories #49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures)
Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in Use ClusterState as Consistency Source for Snapshot Repositories #49060 with the additional cluster state updates between request and response)
Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the client() method
Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)

A few enhancements to `SnapshotResiliencyTests`: 1. Test running requests from random nodes in more spots 2. Fix issue with restarting only master node in one test 3. Improved cluster formation checks

elasticmachine · 2019-11-23T19:37:27Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

…sting

original-brownbear · 2019-11-24T18:27:36Z

Jenkins run elasticsearch-ci/packaging-sample-matrix

ywelsch

Left one comment, looking good o.w.

ywelsch · 2019-11-25T09:45:06Z

server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java

+            clearDisruptionsAndAwaitSync();
+
+            final StepListener<CleanupRepositoryResponse> cleanupResponse = new StepListener<>();
+            client().admin().cluster().cleanupRepository(


I wonder if we should check before the clean-up whether the repo is not corrupted (i.e. does not have meta files pointing at non-existing data files).

i.e. does not have meta files pointing at non-existing data files

This is not something the cleanup will resolve ever. The root index-N points at the uuid named generations of shard-level metadata now. Neither of these is changed in content (only a new index-N with the same content as the previous file is written) as a result of the cleanup action.

-> I think it's fine to test things this way around so long as we don't do any repairs in the cleanup. We do the same in all other tests as well (all the SharedClusterSnapshotRestoreIT and such).

The point is to have stronger assertions, asserting something about the state of the repository before we do any clean-up (and also something that will hold at any point during snapshotting / snapshot deletion etc., which we could check at any time). Having two separate sets of assertions also better shows what's being guaranteed in a situation without clean-up, and what the clean-up is essentially adding. You can do this in a follow-up, I feel however that mixing the two into one here weakens the coverage of these tests.

Makes sense, will add that in a follow up :)

original-brownbear · 2019-11-25T10:41:57Z

Thanks Yannick!

A few enhancements to `SnapshotResiliencyTests`: 1. Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by #49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures) 2. Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in #49060 with the additional cluster state updates between request and response) 3. Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the `client()` method 4. Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)

Enhance SnapshotResiliencyTests

1f5d05f

A few enhancements to `SnapshotResiliencyTests`: 1. Test running requests from random nodes in more spots 2. Fix issue with restarting only master node in one test 3. Improved cluster formation checks

original-brownbear added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 labels Nov 23, 2019

Merge remote-tracking branch 'elastic/master' into more-resiliency-te…

96ef9f2

…sting

original-brownbear added the >test Issues or PRs that are addressing/adding tests label Nov 24, 2019

original-brownbear marked this pull request as ready for review November 24, 2019 08:48

original-brownbear added 3 commits November 24, 2019 17:59

add comment

7782aea

better

1e12781

explanation

5dee7ef

original-brownbear mentioned this pull request Nov 24, 2019

Use ClusterState as Consistency Source for Snapshot Repositories #49060

Merged

original-brownbear requested review from tlrx and ywelsch November 25, 2019 05:31

ywelsch reviewed Nov 25, 2019

View reviewed changes

original-brownbear requested a review from ywelsch November 25, 2019 10:11

ywelsch approved these changes Nov 25, 2019

View reviewed changes

original-brownbear merged commit fe69c60 into elastic:master Nov 25, 2019

original-brownbear deleted the more-resiliency-testing branch November 25, 2019 10:42

original-brownbear mentioned this pull request Nov 25, 2019

Enhance SnapshotResiliencyTests (#49514) #49541

Merged

original-brownbear restored the more-resiliency-testing branch August 6, 2020 19:08

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance SnapshotResiliencyTests#49514

Enhance SnapshotResiliencyTests#49514
original-brownbear merged 5 commits intoelastic:masterfrom
original-brownbear:more-resiliency-testing

original-brownbear commented Nov 23, 2019 •

edited

Loading

Uh oh!

elasticmachine commented Nov 23, 2019

Uh oh!

original-brownbear commented Nov 24, 2019

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Nov 25, 2019

Uh oh!

original-brownbear Nov 25, 2019

Uh oh!

ywelsch Nov 25, 2019

Uh oh!

original-brownbear Nov 25, 2019

Uh oh!

original-brownbear commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

original-brownbear commented Nov 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Nov 23, 2019

Uh oh!

original-brownbear commented Nov 24, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

ywelsch Nov 25, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 25, 2019

Choose a reason for hiding this comment

Uh oh!

ywelsch Nov 25, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 25, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Nov 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear commented Nov 23, 2019 •

edited

Loading