Add Check for Metadata Existence in BlobStoreRepository#59141
Add Check for Metadata Existence in BlobStoreRepository#59141original-brownbear merged 3 commits intoelastic:masterfrom original-brownbear:multi-cluster-corruption-safety-tests
Conversation
In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911
|
|
||
| @Override | ||
| public boolean blobExists(String blobName) throws IOException { | ||
| return store.execute(fileContext -> fileContext.util().exists(new Path(path, blobName))); |
There was a problem hiding this comment.
This one we used to do differently in https://github.com/original-brownbear/elasticsearch/commit/4b8fd4e76f1e344d8994486f28c96d950303cf1a#diff-a6d76133025d0cd3d4c12918d42be05bL66 ... but just catching any io exception and returning false seemed broken to me so I didn't bring that back. Especially since this may now trigger marking a repository as corrupted which we don't want to happen on e.g. a network issue with HDFS.
| private Path repoPath; | ||
|
|
||
| @Before | ||
| public void startSecondCluster() throws IOException, InterruptedException { |
There was a problem hiding this comment.
This may be a little over the top since I'm not making heavy use of the second cluster now (could achieve the same test by just moving the index-N blob by two generations or mounting a second repo to the same path) but I figured it's safer and easier to maintain to do the real thing here and we can extend this to cover more spots when we tighten the safety measures further in #57786
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
ywelsch
left a comment
There was a problem hiding this comment.
I've left 3 small comments, o.w. looking good.
...es/repository-url/src/main/java/org/elasticsearch/common/blobstore/url/URLBlobContainer.java
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/snapshots/MultiClusterRepoAccessIT.java
Outdated
Show resolved
Hide resolved
| secondCluster.client().admin().cluster().prepareDeleteSnapshot(repoNameOnSecondCluster, "snap-1").get(); | ||
| secondCluster.client().admin().cluster().prepareDeleteSnapshot(repoNameOnSecondCluster, "snap-2").get(); | ||
|
|
||
| expectThrows(SnapshotException.class, () -> |
There was a problem hiding this comment.
can we assert that the message here is something about concurrent repo access?
There was a problem hiding this comment.
Sure, done :)
…uption-safety-tests
|
Thanks Yannick! |
) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911
In order to ensure that we do not write a broken piece of
RepositoryDatabecause the physical repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.
The exists checks are the ones we used to employ until we removed the check in 4b8fd4e so this PR is a partial revert of that change.
Relates #56911
WIP for a sec, I'd like CI to run this full first before requesting reviews