Fix TODO about Spurious FAILED Snapshots by original-brownbear · Pull Request #58994 · elastic/elasticsearch

original-brownbear · 2020-07-03T07:50:15Z

There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an INIT snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the INIT step there is no point in doing so.

There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.

elasticmachine · 2020-07-03T07:50:17Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

…urious-failed

original-brownbear · 2020-07-03T16:23:45Z

@tlrx I know we discussed this before and there was some worry about Cloud here. I think that's a non-issue since Cloud is using partial snapshots anyway ever since 7.6, so this change doesn't affect them :)

ywelsch

I've left a few comments, generally looking good though.

ywelsch · 2020-07-06T07:31:41Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

-                                State.FAILED, indexIds, dataStreams, threadPool.absoluteTimeInMillis(), repositoryData.getGenId(), shards,
-                                "Indices don't have primary shards " + missing, userMeta, version);
+                        throw new SnapshotException(
+                            new Snapshot(repositoryName, snapshotId),"Indices don't have primary shards " + missing);


space missing.

ywelsch · 2020-07-06T07:36:23Z

...src/internalClusterTest/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreIT.java

        assertEquals(SnapshotState.SUCCESS, getSnapshotsResponse.getSnapshots("test-repo-2").get(0).state());
    }

-    public void testSnapshotStatusOnFailedIndex() throws Exception {


While this test used the old behavior to get a failed snapshot, it is still a useful test for listing good and bad snapshots, no?

I guess the problem I had was that there was no way of creating a FAILED snapshot any longer and the whole premise of this test was to check that the status of a FAILED snapshot is returned properly from APIs.
Then again, as with the SLM test that I removed, let me see if I can create a BwC test for this by manipulating RepositoryData :)

Alright, brought this back in a much simplified way to make sure we continue to be able to read the FAILED state. I think that's all we need here. Reading failed shard state we test in all kinds of places where we deal with PARTIAL snapshots so I think just faking a 0 shards, failed snapshot here is good enough.

server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java

ywelsch · 2020-07-06T07:40:11Z

x-pack/plugin/ilm/src/test/java/org/elasticsearch/xpack/slm/SLMSnapshotBlockingIntegTests.java

        }
    }

-    public void testBasicFailureRetention() throws Exception {


should we still test this scenario? Can folks disable partial snapshots on SLM?

I don't think we have to, there's no more FAILED state snapshots being created in the repo with this change (not you oculd only ever get a FAILED snapshot if partial was turned off in the first place). We could force the creation of a FAILED snapshot as a BWC test maybe by manually messing with the RepositoryData (come to think of it ... I'll do that, otherwise we lose coverage for a BwC scenario).

This needs some trickier tests it turns out but is well worth it given how SLM low-level manages these snapshot states. I opened #59082 to enable the necessary test infrastructure.

…urious-failed

For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.

…59119) For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.

…urious-failed

original-brownbear · 2020-07-08T10:03:51Z

x-pack/plugin/ilm/src/test/java/org/elasticsearch/xpack/slm/SLMSnapshotBlockingIntegTests.java

+                failedSnapshotName.set(snapshotFuture.get().getSnapshotName());
+                assertNotNull(failedSnapshotName.get());
+            } else {
+                final String snapshotName = "failed-snapshot-1";


Faking the FAILED snapshot with the right metadata so that SLM picks it up here now. It's a bit of a corner case since we won't be creating any new FAILED snapshots but probably nice to have this tested to make sure that in rolling upgrade scenarios FAILED snapshots are getting cleaned up eventually.

ywelsch

LGTM

original-brownbear · 2020-07-08T11:12:49Z

Thanks Yannick!

There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.

original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.9.0 labels Jul 3, 2020

elasticmachine added the Team:Distributed Meta label for distributed team. label Jul 3, 2020

original-brownbear added 5 commits July 3, 2020 10:15

Merge remote-tracking branch 'elastic/master' into remove-snapshot-sp…

6841b41

…urious-failed

fix SLM tests

73cb8ec

fix test

3d84d5e

Merge remote-tracking branch 'elastic/master' into remove-snapshot-sp…

e604ccf

…urious-failed

fix test

67f81a2

original-brownbear requested review from tlrx and ywelsch July 3, 2020 16:22

ywelsch reviewed Jul 6, 2020

View reviewed changes

original-brownbear added 5 commits July 6, 2020 10:19

Merge remote-tracking branch 'elastic/master' into remove-snapshot-sp…

e1c204b

…urious-failed

CR: small points

4936738

Merge remote-tracking branch 'elastic/master' into remove-snapshot-sp…

a5761b1

…urious-failed

bring back one test

f483947

bck

43f11ce

original-brownbear mentioned this pull request Jul 6, 2020

Share IT Infrastructure between Core Snapshot and SLM ITs #59082

Merged

original-brownbear mentioned this pull request Jul 7, 2020

Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) #59119

Merged

original-brownbear added 2 commits July 8, 2020 11:00

Merge remote-tracking branch 'elastic/master' into remove-snapshot-sp…

7a3162b

…urious-failed

test with fake failed snapshot

a1e4611

original-brownbear commented Jul 8, 2020

View reviewed changes

whitespace

828b5ae

original-brownbear requested a review from ywelsch July 8, 2020 10:57

ywelsch approved these changes Jul 8, 2020

View reviewed changes

original-brownbear merged commit 02539fa into elastic:master Jul 8, 2020

original-brownbear deleted the remove-snapshot-spurious-failed branch July 8, 2020 11:13

original-brownbear added the backport pending label Jul 8, 2020

original-brownbear mentioned this pull request Jul 13, 2020

Remove Snapshot INIT Step (#55918) #59374

Merged

original-brownbear removed the backport pending label Jul 14, 2020

original-brownbear mentioned this pull request Jul 14, 2020

Fix TODO about Spurious FAILED Snapshots (#58994) #59576

Merged

original-brownbear restored the remove-snapshot-spurious-failed branch August 6, 2020 18:35

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

original-brownbear commented Jul 3, 2020

Uh oh!

elasticmachine commented Jul 3, 2020

Uh oh!

original-brownbear commented Jul 3, 2020

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jul 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants