Add Check for Metadata Existence in BlobStoreRepository by original-brownbear · Pull Request #59141 · elastic/elasticsearch

original-brownbear · 2020-07-07T12:35:59Z

In order to ensure that we do not write a broken piece of RepositoryData
because the physical repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

The exists checks are the ones we used to employ until we removed the check in 4b8fd4e so this PR is a partial revert of that change.

Relates #56911

WIP for a sec, I'd like CI to run this full first before requesting reviews

In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911

original-brownbear · 2020-07-07T12:47:26Z

...ins/repository-hdfs/src/main/java/org/elasticsearch/repositories/hdfs/HdfsBlobContainer.java


+    @Override
+    public boolean blobExists(String blobName) throws IOException {
+        return store.execute(fileContext -> fileContext.util().exists(new Path(path, blobName)));


This one we used to do differently in https://github.com/original-brownbear/elasticsearch/commit/4b8fd4e76f1e344d8994486f28c96d950303cf1a#diff-a6d76133025d0cd3d4c12918d42be05bL66 ... but just catching any io exception and returning false seemed broken to me so I didn't bring that back. Especially since this may now trigger marking a repository as corrupted which we don't want to happen on e.g. a network issue with HDFS.

original-brownbear · 2020-07-07T13:11:25Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/MultiClusterRepoAccessIT.java

+    private Path repoPath;
+
+    @Before
+    public void startSecondCluster() throws IOException, InterruptedException {


This may be a little over the top since I'm not making heavy use of the second cluster now (could achieve the same test by just moving the index-N blob by two generations or mounting a second repo to the same path) but I figured it's safer and easier to maintain to do the real thing here and we can extend this to cover more spots when we tighten the safety measures further in #57786

elasticmachine · 2020-07-07T13:24:00Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

ywelsch

I've left 3 small comments, o.w. looking good.

...es/repository-url/src/main/java/org/elasticsearch/common/blobstore/url/URLBlobContainer.java

server/src/internalClusterTest/java/org/elasticsearch/snapshots/MultiClusterRepoAccessIT.java

ywelsch · 2020-07-08T10:15:40Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/MultiClusterRepoAccessIT.java

+        secondCluster.client().admin().cluster().prepareDeleteSnapshot(repoNameOnSecondCluster, "snap-1").get();
+        secondCluster.client().admin().cluster().prepareDeleteSnapshot(repoNameOnSecondCluster, "snap-2").get();
+
+        expectThrows(SnapshotException.class, () ->


can we assert that the message here is something about concurrent repo access?

Sure, done :)

…uption-safety-tests

ywelsch

LGTM

original-brownbear · 2020-07-08T11:16:46Z

Thanks Yannick!

) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911

original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.9.0 labels Jul 7, 2020

original-brownbear commented Jul 7, 2020

View reviewed changes

original-brownbear requested a review from ywelsch July 7, 2020 13:23

original-brownbear marked this pull request as ready for review July 7, 2020 13:23

elasticmachine added the Team:Distributed Meta label for distributed team. label Jul 7, 2020

ywelsch reviewed Jul 8, 2020

View reviewed changes

original-brownbear added 2 commits July 8, 2020 12:18

Merge remote-tracking branch 'elastic/master' into multi-cluster-corr…

7573bf3

…uption-safety-tests

CR: comments

04f9491

original-brownbear requested a review from ywelsch July 8, 2020 10:25

ywelsch approved these changes Jul 8, 2020

View reviewed changes

original-brownbear merged commit 5da804b into elastic:master Jul 8, 2020

original-brownbear deleted the multi-cluster-corruption-safety-tests branch July 8, 2020 11:17

original-brownbear mentioned this pull request Jul 8, 2020

Add Check for Metadata Existence in BlobStoreRepository (#59141) #59216

Merged

original-brownbear mentioned this pull request Jul 9, 2020

Enable Fully Concurrent Snapshot Operations #56911

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Check for Metadata Existence in BlobStoreRepository#59141

Add Check for Metadata Existence in BlobStoreRepository#59141
original-brownbear merged 3 commits intoelastic:masterfrom
original-brownbear:multi-cluster-corruption-safety-tests

original-brownbear commented Jul 7, 2020 •

edited

Loading

Uh oh!

original-brownbear Jul 7, 2020

Uh oh!

original-brownbear Jul 7, 2020

Uh oh!

elasticmachine commented Jul 7, 2020

Uh oh!

ywelsch left a comment

Uh oh!

Uh oh!

Uh oh!

ywelsch Jul 8, 2020

Uh oh!

original-brownbear Jul 8, 2020

Uh oh!

ywelsch left a comment

Uh oh!

original-brownbear commented Jul 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

original-brownbear commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

original-brownbear Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jul 7, 2020

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ywelsch Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jul 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear commented Jul 7, 2020 •

edited

Loading