Make BlobStoreRepository Aware of ClusterState by original-brownbear · Pull Request #49639 · elastic/elasticsearch

original-brownbear · 2019-11-27T11:39:30Z

This is a preliminary to #49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service
to the blob store, associated test changes and a best effort approach to tracking the latest repository
generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest
repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060

This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060

elasticmachine · 2019-11-27T11:39:32Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2019-11-27T12:04:50Z

@ywelsch @tlrx I know that the practical improvement to resiliency from this change is pretty small (see PR description ... we're always picking up the best known state before an operation), but I think it should take a lot of the pain out of reviewing #49060 by containing almost all non-functional changes in that PR. Let me know what you think :)

ywelsch

Left some comments. Thanks!

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

ywelsch · 2019-11-27T12:54:30Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    // Inspects all cluster state elements that contain a hint about what the current repository generation is and updates
+    // #latestKnownRepoGen if a newer than currently known generation is found
+    @Override
+    public void applyClusterState(ClusterChangedEvent event) {


I wonder if RepositoriesService should update the relevant repository about changes to their snapshots. RepositoriesService is already a cluster state listener, which means that we don't need an additional lifecycle here.

This would also be closer to how we inform shards about updates (see IndexShard.updateShardState)

I thought about that initially but then dropped the idea because that would've meant leaking the specifics of BlobStoreRepository into RepositoriesService.
But come to think of it now ... doing this would also save a bit of tricky of code in #49060 to get the initialization of the repo on non-master nodes right.
Let's do it -> will do :)

I wonder if RepositoriesService should update the relevant repository about changes to their snapshots.

I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.

Could I warm your heart for cfb104f maybe? :)
I think in the case of the repositories it's better to pass the whole ClusterState to the repo instead of figuring out what parts go to what repo in RepositoriesService. Otherwise we needlessly waste cycles on the repos that don't require the cluster state (e.g. read-only repos, ccr, ...). Also, in #49060 the parts of the CS that need to be inspected will change, but we may want to keep the current logic from here as BwC fallback which is much easier if we just pass the full ClusterState down.
Also, I realized that I actually had to move the cluster state application method to Repository because of FilterRepository (and upcoming encrypted repo wrapper etc.) so this didn't leak any blob store repo specifics to RepositoriesService after all :)

ywelsch · 2019-11-27T12:56:55Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        final SnapshotsInProgress snapshotsInProgress = state.custom(SnapshotsInProgress.TYPE);
+        if (snapshotsInProgress != null) {
+            final SnapshotsInProgress.Entry entry = snapshotsInProgress.entries().stream()
+                .filter(e -> e.snapshot().getRepository().equals(repoName)).findFirst()


why findFirst? Let's take the max of all ongoing snapshots for this repo?

Well, currently there's just one entry here at all times. I don't think that will change before this logic gets replaced by the more inolved logic in #49060 and I'm not sure if and when we move to parallel snapshot taking, that those will in fact have different repository generations set (in my prototype for parallel ops they wouldn't have at least ...). Could just assert that the count of snapshots in progress is always 1 here? :)

ywelsch · 2019-11-27T13:00:31Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        }
+
+        final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE);
+        if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) {


let's remove this extra condition (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN), it's an optimization which does not mattter I think and could hurt us later.

No actually that's not true I think. We have the sitaution where if you abort a snapshot, the delete entry will be created with currentRepoGeneration + 1 since that's what the repo will be at when the delete actually runs. That's why I added this.

I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?

There is the one special case of aborting a snapshot where you can have an in progress snapshot and a delete. The delete will contain the future repository generation for after the snapshot finished -> we can't use that. That's why I added this condition.

Can we fix that situation? And can you add a comment to that effect? The condition as it stands here today is very unintuitive.

Can we fix that situation?

Yes, once we have #49060 we can clean this up when implementing concurrent operations on the repo. That will require the whole business of associating an operation with a strict repo generation to go away anyway.

=> Added a comment for now :)

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

tlrx

This looks good already, I'm waiting for Yannick's feedback to be addressed before LGTM :)

...y-azure/src/test/java/org/elasticsearch/repositories/azure/AzureRepositorySettingsTests.java

tlrx · 2019-11-27T16:07:40Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    // Inspects all cluster state elements that contain a hint about what the current repository generation is and updates
+    // #latestKnownRepoGen if a newer than currently known generation is found
+    @Override
+    public void applyClusterState(ClusterChangedEvent event) {


I wonder if RepositoriesService should update the relevant repository about changes to their snapshots.

I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.

original-brownbear · 2019-11-27T17:50:07Z

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

            throw new RepositoryException(repositoryMetaData.name(),
                "repository type [" + repositoryMetaData.type() + "] does not exist");
        }
+        boolean success = false;


This isn't strickly necessary with the changes just now, but maybe fine to keep it here since we discovered it here and I think we should def. cleanup on a failed start out of principle?

tlrx · 2019-11-28T08:29:36Z

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

            }
            repositories = Collections.unmodifiableMap(builder);
+            for (Repository repo : repositories.values()) {
+                repo.updateState(state);


~~I think we need to catch a potential RepositoryException here, log it and continue to update the others repositories.~~

I misread the code, sorry... But it maybe still worth it, just in case?

I wonder, looking at the code above we aren't defensive around closeRepository either (though that might be called in a loop as well) but catch (which has a TODO about it) on createRepository.
Maybe we should rather wrap this in a

try { repo.updateState(state); } catch (Exception e) { assert false; throw e; }

and do the same for closeRepository? Could do it in a follow up and remove that todo while we're at it :)

My main contention to just catch and log here would be that once the state update is important for the proper functioning of blob store repositories, then what would an uncaught exception even mean? (imo it would mean that the repo can't be used for writes any longer ... but that would be something the repo would have to set in its internal state when handling exceptions)

The bigger issue I think is to get rid of the condition:

if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {

which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.

tlrx

LGTM, thanks Armin. I left a small comment where I think we should be defensive just in case an exception is thrown while updating the repositories states.

original-brownbear · 2019-11-28T09:11:52Z

@tlrx as just discussed fixed the exposing of repositories before update in c257b0e (thanks for catching this!)

tlrx · 2019-11-28T09:24:05Z

@tlrx as just discussed fixed the exposing of repositories before update in c257b0e (thanks for catching this!)

Thanks Armin! As discussed I think we should have a look at the other issues in the applyClusterState() method (like maybe only create + update repositories before closing the existing ones) and that could be done in a follow up PR.

original-brownbear · 2019-11-28T09:48:38Z

Jenkins run elasticsearch-ci/1 (unrelated transport test failure)

ywelsch · 2019-11-28T16:24:45Z

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

            }
            repositories = Collections.unmodifiableMap(builder);
+            for (Repository repo : repositories.values()) {
+                repo.updateState(state);


The bigger issue I think is to get rid of the condition:

if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {

which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.

server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java

ywelsch · 2019-11-28T16:31:36Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        }
+
+        final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE);
+        if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) {


I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

original-brownbear · 2019-11-28T17:29:32Z

Jenkins run elasticsearch-ci/bwc (seems like a messed up version got build ...)
Jenkins run elasticsearch-ci/default-distro

original-brownbear · 2019-11-28T19:27:40Z

Somethings messed up with the BwC tests here (some version constant test), not related to my changes.
Opened #49696 for test failures

ywelsch

LGTM

original-brownbear · 2019-11-29T09:14:09Z

Thanks Yannick + Tanguy!

* Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060

This is a preliminary to elastic#49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in elastic#49060

original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 labels Nov 27, 2019

original-brownbear added 2 commits November 27, 2019 12:42

obv...

bae7a9e

shorter

741fabf

original-brownbear requested review from tlrx and ywelsch November 27, 2019 12:01

ywelsch suggested changes Nov 27, 2019

View reviewed changes

safer repo close

991f368

tlrx reviewed Nov 27, 2019

View reviewed changes

original-brownbear added 2 commits November 27, 2019 18:39

apply all state from RepositoriesService

cfb104f

drier mocking

6ddfaf0

original-brownbear commented Nov 27, 2019

View reviewed changes

original-brownbear requested review from tlrx and ywelsch November 27, 2019 17:50

original-brownbear mentioned this pull request Nov 27, 2019

Fix Incorrect Repo Start in SnapshotResiliencyTests #49626

Closed

tlrx reviewed Nov 28, 2019

View reviewed changes

tlrx approved these changes Nov 28, 2019

View reviewed changes

original-brownbear added 2 commits November 28, 2019 10:10

Merge remote-tracking branch 'elastic/master' into repo-uses-cs-light

4880d09

fix order of operations

c257b0e

ywelsch suggested changes Nov 28, 2019

View reviewed changes

original-brownbear added 2 commits November 28, 2019 17:55

Merge remote-tracking branch 'elastic/master' into repo-uses-cs-light

3f3e0df

CR fixes

9b2d2ac

original-brownbear requested a review from ywelsch November 28, 2019 17:10

original-brownbear added 4 commits November 28, 2019 20:38

Merge remote-tracking branch 'elastic/master' into repo-uses-cs-light

7689aa5

Merge remote-tracking branch 'elastic/master' into repo-uses-cs-light

5da114d

Merge remote-tracking branch 'elastic/master' into repo-uses-cs-light

3765863

CR: add comment

331b4a0

ywelsch approved these changes Nov 29, 2019

View reviewed changes

original-brownbear merged commit 459d8ed into elastic:master Nov 29, 2019

original-brownbear deleted the repo-uses-cs-light branch November 29, 2019 09:14

original-brownbear added backport pending and removed backport pending labels Nov 29, 2019

original-brownbear mentioned this pull request Nov 29, 2019

Make BlobStoreRepository Aware of ClusterState (#49639) #49711

Merged

original-brownbear mentioned this pull request Nov 29, 2019

Fix RepositoriesService Cluster State Application #49723

Closed

mkleen mentioned this pull request Jun 1, 2021

Deguice Backports crate/crate#11422

Merged

5 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

original-brownbear commented Nov 27, 2019

Uh oh!

elasticmachine commented Nov 27, 2019

Uh oh!

original-brownbear commented Nov 27, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Nov 28, 2019

Uh oh!

tlrx commented Nov 28, 2019

Uh oh!

original-brownbear commented Nov 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

original-brownbear commented Nov 28, 2019

Uh oh!

original-brownbear commented Nov 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Nov 29, 2019

original-brownbear Nov 28, 2019 •

edited

Loading

tlrx Nov 28, 2019 •

edited

Loading

original-brownbear commented Nov 28, 2019 •

edited

Loading