Make BlobStoreRepository Aware of ClusterState#49639
Make BlobStoreRepository Aware of ClusterState#49639original-brownbear merged 14 commits intoelastic:masterfrom original-brownbear:repo-uses-cs-light
Conversation
This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
|
@ywelsch @tlrx I know that the practical improvement to resiliency from this change is pretty small (see PR description ... we're always picking up the best known state before an operation), but I think it should take a lot of the pain out of reviewing #49060 by containing almost all non-functional changes in that PR. Let me know what you think :) |
ywelsch
left a comment
There was a problem hiding this comment.
Left some comments. Thanks!
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
| // Inspects all cluster state elements that contain a hint about what the current repository generation is and updates | ||
| // #latestKnownRepoGen if a newer than currently known generation is found | ||
| @Override | ||
| public void applyClusterState(ClusterChangedEvent event) { |
There was a problem hiding this comment.
I wonder if RepositoriesService should update the relevant repository about changes to their snapshots. RepositoriesService is already a cluster state listener, which means that we don't need an additional lifecycle here.
This would also be closer to how we inform shards about updates (see IndexShard.updateShardState)
There was a problem hiding this comment.
I thought about that initially but then dropped the idea because that would've meant leaking the specifics of BlobStoreRepository into RepositoriesService.
But come to think of it now ... doing this would also save a bit of tricky of code in #49060 to get the initialization of the repo on non-master nodes right.
Let's do it -> will do :)
There was a problem hiding this comment.
I wonder if
RepositoriesServiceshould update the relevant repository about changes to their snapshots.
I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.
There was a problem hiding this comment.
Could I warm your heart for cfb104f maybe? :)
I think in the case of the repositories it's better to pass the whole ClusterState to the repo instead of figuring out what parts go to what repo in RepositoriesService. Otherwise we needlessly waste cycles on the repos that don't require the cluster state (e.g. read-only repos, ccr, ...). Also, in #49060 the parts of the CS that need to be inspected will change, but we may want to keep the current logic from here as BwC fallback which is much easier if we just pass the full ClusterState down.
Also, I realized that I actually had to move the cluster state application method to Repository because of FilterRepository (and upcoming encrypted repo wrapper etc.) so this didn't leak any blob store repo specifics to RepositoriesService after all :)
| final SnapshotsInProgress snapshotsInProgress = state.custom(SnapshotsInProgress.TYPE); | ||
| if (snapshotsInProgress != null) { | ||
| final SnapshotsInProgress.Entry entry = snapshotsInProgress.entries().stream() | ||
| .filter(e -> e.snapshot().getRepository().equals(repoName)).findFirst() |
There was a problem hiding this comment.
why findFirst? Let's take the max of all ongoing snapshots for this repo?
There was a problem hiding this comment.
Well, currently there's just one entry here at all times. I don't think that will change before this logic gets replaced by the more inolved logic in #49060 and I'm not sure if and when we move to parallel snapshot taking, that those will in fact have different repository generations set (in my prototype for parallel ops they wouldn't have at least ...). Could just assert that the count of snapshots in progress is always 1 here? :)
| } | ||
|
|
||
| final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE); | ||
| if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) { |
There was a problem hiding this comment.
let's remove this extra condition (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN), it's an optimization which does not mattter I think and could hurt us later.
There was a problem hiding this comment.
No actually that's not true I think. We have the sitaution where if you abort a snapshot, the delete entry will be created with currentRepoGeneration + 1 since that's what the repo will be at when the delete actually runs. That's why I added this.
There was a problem hiding this comment.
I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?
There was a problem hiding this comment.
There is the one special case of aborting a snapshot where you can have an in progress snapshot and a delete. The delete will contain the future repository generation for after the snapshot finished -> we can't use that. That's why I added this condition.
There was a problem hiding this comment.
Can we fix that situation? And can you add a comment to that effect? The condition as it stands here today is very unintuitive.
There was a problem hiding this comment.
Can we fix that situation?
Yes, once we have #49060 we can clean this up when implementing concurrent operations on the repo. That will require the whole business of associating an operation with a strict repo generation to go away anyway.
=> Added a comment for now :)
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
tlrx
left a comment
There was a problem hiding this comment.
This looks good already, I'm waiting for Yannick's feedback to be addressed before LGTM :)
...y-azure/src/test/java/org/elasticsearch/repositories/azure/AzureRepositorySettingsTests.java
Outdated
Show resolved
Hide resolved
| // Inspects all cluster state elements that contain a hint about what the current repository generation is and updates | ||
| // #latestKnownRepoGen if a newer than currently known generation is found | ||
| @Override | ||
| public void applyClusterState(ClusterChangedEvent event) { |
There was a problem hiding this comment.
I wonder if
RepositoriesServiceshould update the relevant repository about changes to their snapshots.
I agree, that seems to be the right thing to do. I'm also not super happy to have BlobStoreRepository have its own lifecycle.
| throw new RepositoryException(repositoryMetaData.name(), | ||
| "repository type [" + repositoryMetaData.type() + "] does not exist"); | ||
| } | ||
| boolean success = false; |
There was a problem hiding this comment.
This isn't strickly necessary with the changes just now, but maybe fine to keep it here since we discovered it here and I think we should def. cleanup on a failed start out of principle?
| } | ||
| repositories = Collections.unmodifiableMap(builder); | ||
| for (Repository repo : repositories.values()) { | ||
| repo.updateState(state); |
There was a problem hiding this comment.
I think we need to catch a potential RepositoryException here, log it and continue to update the others repositories.
I misread the code, sorry... But it maybe still worth it, just in case?
There was a problem hiding this comment.
I wonder, looking at the code above we aren't defensive around closeRepository either (though that might be called in a loop as well) but catch (which has a TODO about it) on createRepository.
Maybe we should rather wrap this in a
try {
repo.updateState(state);
} catch (Exception e) {
assert false;
throw e;
}and do the same for closeRepository? Could do it in a follow up and remove that todo while we're at it :)
There was a problem hiding this comment.
My main contention to just catch and log here would be that once the state update is important for the proper functioning of blob store repositories, then what would an uncaught exception even mean? (imo it would mean that the repo can't be used for writes any longer ... but that would be something the repo would have to set in its internal state when handling exceptions)
There was a problem hiding this comment.
The bigger issue I think is to get rid of the condition:
if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {
which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.
tlrx
left a comment
There was a problem hiding this comment.
LGTM, thanks Armin. I left a small comment where I think we should be defensive just in case an exception is thrown while updating the repositories states.
Thanks Armin! As discussed I think we should have a look at the other issues in the applyClusterState() method (like maybe only create + update repositories before closing the existing ones) and that could be done in a follow up PR. |
|
Jenkins run elasticsearch-ci/1 (unrelated transport test failure) |
| } | ||
| repositories = Collections.unmodifiableMap(builder); | ||
| for (Repository repo : repositories.values()) { | ||
| repo.updateState(state); |
There was a problem hiding this comment.
The bigger issue I think is to get rid of the condition:
if ((oldMetaData == null && newMetaData == null) || (oldMetaData != null && oldMetaData.equals(newMetaData))) {
which currently compares the current with the previous cluster state. If anything went wrong applying the previous cluster state, we will not update the repo again.
server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| final SnapshotDeletionsInProgress deletionsInProgress = state.custom(SnapshotDeletionsInProgress.TYPE); | ||
| if (bestGenerationFromCS == RepositoryData.EMPTY_REPO_GEN && deletionsInProgress != null) { |
There was a problem hiding this comment.
I don't understand this condition. Why are we not just taking the max of all the entries that we see? Also, I suppose that we don't allow concurrent operations right now, so we assume that we have only one of these 3 metadata for the current repo?
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
|
Jenkins run elasticsearch-ci/bwc (seems like a messed up version got build ...) |
|
Somethings messed up with the BwC tests here (some version constant test), not related to my changes. |
|
Thanks Yannick + Tanguy! |
* Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060
This is a preliminary to elastic#49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in elastic#49060
This is a preliminary to #49060.
It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service
to the blob store, associated test changes and a best effort approach to tracking the latest repository
generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest
repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060